A quick intro to the intro to R Lesson Series


This ‘Intro to R Lesson Series’ is brought to you by the Centre for the Analysis of Genome Evolution & Function’s (CAGEF) bioinformatics training initiative. This course was developed based on feedback on the needs and interests of the Department of Cell & Systems Biology and the Department of Ecology and Evolutionary Biology.

This lesson is the third in a 6-part series. The idea is that at the end of the series, you will be able to import and manipulate your data, make exploratory plots, perform some basic statistical tests, test a regression model, and make some even prettier plots and documents to share your results.


How do we get there? Today we are going to be learning how to make all sorts of plots - from simple data exploration to interactive plots.The next lesson will be data cleaning and string manipulation; this is really the battleground of coding - getting your data into the format where you can analyse it. Then we will learn how to do t-tests and perform regression and modeling in R. And lastly, we will learn to write some functions, which really can save you time and help scale up your analyses.


The structure of the class is a code-along style. It is hands on. The lecture AND code we are going through are available on GitHub for download at https://github.com/eacton/CAGEF (Note: repo is private until approved), so you can spend the time coding and not taking notes. As we go along, there will be some challenge questions and multiple choice questions on Socrative. At the end of the class if you could please fill out a post-lesson survey (https://www.surveymonkey.com/r/VNQZ3KS), it will help me further develop this course and would be greatly appreciated.


Packages Used in This Lesson

The following packages are used in this lesson:

tidyverse (ggplot2, tidyr, dplyr)
(twitteR)* tidytext
viridis

*Used to generate the tweet tables used in this lesson. It is not necessary for you to install this - you can work from the tables. If you want to create these files - the code is here …………….(insert link).

Please install and load these packages for the lesson. In this document I will load each package separately, but I will not be reminding you to install the package. Remember: these packages may be from CRAN OR Bioconductor.


Highlighting

grey background - a package or function or code
italics - an important term or concept
bold - heading or ‘grammar of graphics’ term


Objective: At the end of this session you will be able to use regular expressions to ‘clean’ your data. You will also learn R markdown and be able to render your R code into slides, a pdf, html, a word document, or a notebook.


Data Cleaning or Data Munging or Data Wrangling

Why do we need to do this?

‘Raw’ data is seldom (never) in a useable format. Data in tutorials or demos has already been meticulously filtered, transformed and readied for that specific analysis. How many people have done a tutorial only to find they can’t get their own data in the @*%($! format to use the tool they have just spend an hour learning about???

Data cleaning requires us to:

Some definitions might take this a bit farther and include normalizing data and removing outliers, but I consider data cleaning at least, getting data into a format where we can start actively doing ‘the maths or the graphs’ whether it be statistical calculations, normalization or exploratory plots. We have learned how to transform data into a tidy format in Lesson 2, but the prelude to transforming data is doing the grunt work mentioned above. So let’s get to it!


Intro to regular expressions

Regular expressions

“A God-awful and powerful language for expressing patterns to match in text or for search-and-replace. Frequently described as ‘write only’, because regular expressions are easier to write than to read/understand. And they are not particularly easy to write.” - Jenny Bryan


So why does regex get so much flak?

Scary example: how to get an email in different programming languages (http://emailregex.com/). Whatever.


What does the language look like?

Matching by position

Where is the character in the string?

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Expression </th>
   <th style="text-align:left;"> Meaning </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> ^ </td>
   <td style="text-align:left;width: 40em; "> start of string </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> $ </td>
   <td style="text-align:left;width: 40em; "> end of string </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> \b </td>
   <td style="text-align:left;width: 40em; "> empty string at either edge of a word </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> \B </td>
   <td style="text-align:left;width: 40em; "> empty string that is NOT at the edge of a word </td>
  </tr>
</tbody>
</table>

Quantifiers

How many times will a character appear?

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Expression </th>
   <th style="text-align:left;"> Meaning </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> ? </td>
   <td style="text-align:left;width: 40em; "> 0 or 1 </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> * </td>
   <td style="text-align:left;width: 40em; "> 0 or more </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> + </td>
   <td style="text-align:left;width: 40em; "> 1 or more </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> {n} </td>
   <td style="text-align:left;width: 40em; "> exactly n </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> {n,} </td>
   <td style="text-align:left;width: 40em; "> at least n </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> {,n} </td>
   <td style="text-align:left;width: 40em; "> at most n </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> {n,m} </td>
   <td style="text-align:left;width: 40em; "> between n and m (inclusive) </td>
  </tr>
</tbody>
</table>

Classes

What kind of character is it?

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Expression </th>
   <th style="text-align:left;"> Meaning </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> \w, [A-z0-9], [[:alnum:]] </td>
   <td style="text-align:left;width: 40em; "> word characters (letters + digits) </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> \d, [0-9], [[:digit:]] </td>
   <td style="text-align:left;width: 40em; "> digits </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> [A-z], [:alpha:] </td>
   <td style="text-align:left;width: 40em; "> alphabetical characters </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> \s, [[:space:]] </td>
   <td style="text-align:left;width: 40em; "> space </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> [[:punct:]] </td>
   <td style="text-align:left;width: 40em; "> punctuation </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> [[:lower:]] </td>
   <td style="text-align:left;width: 40em; "> lowercase </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> [[:upper:]] </td>
   <td style="text-align:left;width: 40em; "> uppercase </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> \W, [^A-z0-9] </td>
   <td style="text-align:left;width: 40em; "> not word characters </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> \S </td>
   <td style="text-align:left;width: 40em; "> not space </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> \D, [^0-9] </td>
   <td style="text-align:left;width: 40em; "> not digits </td>
  </tr>
</tbody>
</table>

Operators

Helper actions to match your characters.

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Expression </th>
   <th style="text-align:left;"> Meaning </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> | </td>
   <td style="text-align:left;width: 40em; "> or </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> . </td>
   <td style="text-align:left;width: 40em; "> matches any single character </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> [  ] </td>
   <td style="text-align:left;width: 40em; "> matches ANY of the characters inside the brackets </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> [ - ] </td>
   <td style="text-align:left;width: 40em; "> matches a RANGE of characters inside the brackets </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> [^ ] </td>
   <td style="text-align:left;width: 40em; "> matches any character EXCEPT those inside the bracket </td>
  </tr>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> ( ) </td>
   <td style="text-align:left;width: 40em; "> grouping - used for _backreferencing_ </td>
  </tr>
</tbody>
</table>

Escape characters

Sometimes a character is just a character… allows you to use a character ‘as is’ rather than its special function. In R, regex gets evaluated as a string before a regular expression, and a backslash is used to escape there as well - so you really need 2 backslashes to escape, say, a $ sign ("\\\$").

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Expression </th>
   <th style="text-align:left;"> Meaning </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;font-style: italic;border-right:1px solid;"> \ </td>
   <td style="text-align:left;width: 40em; "> escape! necessary to use special meta-characters (*, $, ^, ., ?, |, \, [, ], {, }, (, )) [Note that the backslash is a meta-character as well] </td>
  </tr>
</tbody>
</table>

Trouble-shooting with escaping meta-characters means adding backslashes until something works. (Joking/not joking)

Joking/Not Joking

Joking/Not Joking


Data Cleaning with Base R (AKA What is Elon Musk up to anyways?)

Let’s take this cacaphony of characters we’ve just learned about and perform some basic data cleaning tasks with an actual (fun?) messy data set. I have scraped Elon Musk’s latest tweets from Twitter. The code to do this is in the Lesson 4 markdown file if you are curious and/or want to creep someone on Twitter.

Let’s read in the set of tweets, take a look at the structure of the data, and use ‘tidyverse’ to order the data by the most popular (favorited) tweets. Let’s check out the top 5 favorite tweets.

library(tidyverse)
elon_tweets_df <- read.delim("data/elon_tweets_df.txt", sep = "\t", stringsAsFactors = F)
EOF within quoted string
str(elon_tweets_df)
'data.frame':   348 obs. of  16 variables:
 $ text         : chr  "@Complex This is false" "https://t.co/UWJK1LwgKf" "@rosechehrazi Oh, it’s on …" "Most people don’t know there’s a whole box of Easter eggs with every Tesla. Just tap logo on center screen &amp; wait.… https:/ ...
 $ favorited    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ favoriteCount: int  18526 10579 3583 26710 2975 2132 11397 25003 2590 3625 ...
 $ replyToSN    : chr  "Complex" "elonmusk" "rosechehrazi" NA ...
 $ created      : chr  "2018-04-04 22:25:31" "2018-04-04 17:24:48" "2018-04-04 17:16:57" "2018-04-04 17:16:13" ...
 $ truncated    : logi  FALSE FALSE FALSE TRUE FALSE TRUE ...
 $ replyToSID   : num  9.82e+17 9.82e+17 9.82e+17 NA 9.81e+17 ...
 $ id           : num  9.82e+17 9.82e+17 9.82e+17 9.82e+17 9.81e+17 ...
 $ replyToUID   : num  1.30e+07 4.42e+07 2.39e+09 NA 1.15e+08 ...
 $ statusSource : chr  "<a href=\\http://twitter.com/download/iphone\\ rel=\\nofollow\\>Twitter for iPhone</a>" "<a href=\\http://twitter.com/download/iphone\\ rel=\\nofollow\\>Twitter for iPhone</a>" "<a href=\\http://twitter.com/download/iphone\\ rel=\\nofollow\\>Twitter for iPhone</a>" "<a href=\\http://twitter.com/download/iphone\\ rel=\\nofollow\\>Twitter for iPhone</a>" ...
 $ screenName   : chr  "elonmusk" "elonmusk" "elonmusk" "elonmusk" ...
 $ retweetCount : int  2830 738 172 2103 325 186 634 2216 89 79 ...
 $ isRetweet    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ retweeted    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ longitude    : logi  NA NA NA NA NA NA ...
 $ latitude     : logi  NA NA NA NA NA NA ...
elon_tweets_df <- elon_tweets_df %>% arrange(desc(favoriteCount))
elon_tweets_df$text[1:5]
[1] "0 to 100 km/h in 1.9 sec https://t.co/xTOTDGuwQj"                                                                                            
[2] "Apparently, some customs agencies are saying they won’t allow shipment of anything called a “Flamethrower”. To solv… https://t.co/OCtjvdXo95"
[3] "The rumor that I’m secretly creating a zombie apocalypse to generate demand for flamethrowers is completely false"                           
[4] "Nuclear alien UFO from North Korea https://t.co/GUIHpKkkp5"                                                                                  
[5] "Ok, who leaked my selfie!? https://t.co/fYKXbix8jw"                                                                                          

Our end goal is going to be to look at the top 50 words in Elon Musk’s tweets. I emphasize words, because I don’t want urls, or hastags, or other tags. I also don’t want punctuation or spaces. I want to extract just the words from tweets. First, I want to get remove the tags from the beginning of words. I am going to save my regex expression into an object - so we can use them again later.

What this expression says is that I want to find matches for a hastag OR an asperand followed by at least one word character. grep is a function that allows us to match our pattern (our expression) to a character vector.

tags <- "#|@\\w+"
grep(pattern = tags, x = elon_tweets_df$text)
input string 10 is invalid in this localeinput string 118 is invalid in this localeinput string 156 is invalid in this localeinput string 219 is invalid in this localeinput string 224 is invalid in this locale
  [1]  27  34  45  49  56  57  58  70  73  74  81  82  87  89  93  98 103 117 119 122 123 129 131 132 134 135 142 147 148 151 154
 [32] 158 159 160 164 166 169 171 172 173 174 175 177 179 181 182 183 184 185 187 188 189 192 193 194 196 197 198 201 202 203 204
 [63] 205 206 207 208 209 210 211 212 213 214 215 216 217 218 220 221 222 223 225 226 227 228 229 230 231 232 233 234 235 236 237
 [94] 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268
[125] 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 293 294 295 296 297 298 299 300
[156] 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331
[187] 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348

We can see that grep returns the index of the match. We have a number of entries that include tags. However, if we want to return the tweet itself instead of the index, we can use the argument value = TRUE. It is a good idea to do a visual inspection of your result to make sure your matches or substitutions are working the way you expected. In this case, it looks like each tweet does have a tag. We can then use gsub to replace that pattern (our tags) with nothing (an empty string).

grep(tags, elon_tweets_df$text, value = TRUE) %>% head()
input string 10 is invalid in this localeinput string 118 is invalid in this localeinput string 156 is invalid in this localeinput string 219 is invalid in this localeinput string 224 is invalid in this locale
[1] "@FortuneTech Do it"                                                                                                                          
[2] "Flight profile #FalconHeavy #SpaceX https://t.co/LlfWXqUaLP"                                                                                 
[3] "@angilly I don’t get the little ship thing. You can’t show up at Mars in something the size of a rowboat. What if t… https://t.co/Aj0zv8Lwdf"
[4] "Launch auto-sequence initiated (aka the holy mouse-click) for 3:45 liftoff #FalconHeavy"                                                     
[5] "@brianacton What’s Facebook?"                                                                                                                
[6] "@VentureBeat @kharijohnson We’ve never advertised with FB. None of my companies buy advertising or pay famous peopl… https://t.co/AhG6MbsXHP"
elon_tweets_df$text <- gsub(pattern = tags, replacement = "", elon_tweets_df$text)

It also looks like anything that was an apostrophe has been replaced with a string of numbers and backslashes. This is due to the fact that tweets are encoded in UTF-16 and converted to UTF-8. Other things that have character codes, like emojis, will also be encoded differently. Here is an example of emoji encoding: https://raw.githubusercontent.com/today-is-a-good-day/Emoticons/master/emDict.csv. Since we are going to remove punctuation, we are going to ‘convert’ the encoding by substituting it with nothing (again, an empty character string). This is not something you will have to deal with on a daily basis, but character encoding is something to be aware of, especially when scraping data from the web.

iconv(elon_tweets_df$text, "UTF-8", "ASCII", sub="") %>% head()
[1] "0 to 100 km/h in 1.9 sec https://t.co/xTOTDGuwQj"                                                                                        
[2] "Apparently, some customs agencies are saying they wont allow shipment of anything called a Flamethrower. To solv https://t.co/OCtjvdXo95"
[3] "The rumor that Im secretly creating a zombie apocalypse to generate demand for flamethrowers is completely false"                        
[4] "Nuclear alien UFO from North Korea https://t.co/GUIHpKkkp5"                                                                              
[5] "Ok, who leaked my selfie!? https://t.co/fYKXbix8jw"                                                                                      
[6] "\\If one day, my words are against science, choose science.\\\nMustafa Kemal Atatrk"                                                     
elon_tweets_df$text <- iconv(elon_tweets_df$text, "UTF-8", "ASCII", sub = "")

Our next step would be to remove urls. This is a bit tricky. We could be looking for http:// or https:// followed by we don’t know what (some combination of letters, numbers and forward slashes).

We can check out which tweets have urls using grep as we did previously. We can also use grepl to get a logical reponse for if a tweet has a url or not. That way, if you wanted to grab all of the urls that Elon Musk suggests to visit, you can filter with grepl, to select all of the tweets where it is TRUE that a url is present.

However, we are going to continue our pattern of substituting what we don’t want with an empty character string.

url <- "http[s]?://[[:alnum:].\\/]+"
grep(url, elon_tweets_df$text, value = TRUE) %>% head()
[1] "0 to 100 km/h in 1.9 sec https://t.co/xTOTDGuwQj"                                                                                        
[2] "Apparently, some customs agencies are saying they wont allow shipment of anything called a Flamethrower. To solv https://t.co/OCtjvdXo95"
[3] "Nuclear alien UFO from North Korea https://t.co/GUIHpKkkp5"                                                                              
[4] "Ok, who leaked my selfie!? https://t.co/fYKXbix8jw"                                                                                      
[5] "https://t.co/pNElNTmcKf"                                                                                                                 
[6] "Falcon Heavy at the Cape https://t.co/hizfDVsU7X"                                                                                        
grepl(url, elon_tweets_df$text) %>% head()
[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE
elon_urls <- elon_tweets_df %>% filter(grepl(url, elon_tweets_df$text))
elon_tweets_df$text <- gsub(pattern = "http[s]?://[[:alnum:].\\/]+", replacement = "", elon_tweets_df$text)

Lastly, we are going to get rid of trailing spaces, numbers, and punctuation all at the same time. You can find trailing at the very end of our tweet string.

trail <- "[ ]+$|[0-9]*|[[:punct:]]"
grep(trail, elon_tweets_df$text, value = TRUE) %>% head()
[1] "0 to 100 km/h in 1.9 sec "                                                                                        
[2] "Apparently, some customs agencies are saying they wont allow shipment of anything called a Flamethrower. To solv "
[3] "The rumor that Im secretly creating a zombie apocalypse to generate demand for flamethrowers is completely false" 
[4] "Nuclear alien UFO from North Korea "                                                                              
[5] "Ok, who leaked my selfie!? "                                                                                      
[6] "\\If one day, my words are against science, choose science.\\\nMustafa Kemal Atatrk"                              
elon_tweets_df$text <- gsub(pattern = trail, replacement = "", elon_tweets_df$text)
elon_tweets_df$text[1:5]
[1] " to  kmh in  sec"                                                                                                
[2] "Apparently some customs agencies are saying they wont allow shipment of anything called a Flamethrower To solv"  
[3] "The rumor that Im secretly creating a zombie apocalypse to generate demand for flamethrowers is completely false"
[4] "Nuclear alien UFO from North Korea"                                                                              
[5] "Ok who leaked my selfie"                                                                                         

It looks like everything worked except that a spacing issue was created by removing all of the numbers. Let’s take all of the places where there are 2 or more spaces created and substitute them with just one space.

space <- "\\s{2,}"
grep(space, elon_tweets_df$text, value = TRUE) %>% head()
[1] " to  kmh in  sec"                                                                                                                      
[2] "Tesla Goes Bankrupt\nPalo Alto California April    Despite intense efforts to raise money including a las"                             
[3] "If you liked tonights launch you will really like Falcon Heavy next month  rocket cores amp X thrust  cores re"                        
[4] "Elon was found passed out against a Tesla Model  surrounded by Teslaquilla bottles the tracks of dried tears s"                        
[5] "Turns out joking about being a rock star because of digging tunnels through uh rock  hello is deeply underappreciated"                 
[6] "Todays Falcon launch carries  SpaceX test satellites for global broadband If successful Starlink constellation will serve least served"
elon_tweets_df$text <- gsub(pattern = space, replacement = " ", elon_tweets_df$text)
elon_tweets_df$text[1:5]
[1] " to kmh in sec"                                                                                                  
[2] "Apparently some customs agencies are saying they wont allow shipment of anything called a Flamethrower To solv"  
[3] "The rumor that Im secretly creating a zombie apocalypse to generate demand for flamethrowers is completely false"
[4] "Nuclear alien UFO from North Korea"                                                                              
[5] "Ok who leaked my selfie"                                                                                         

Challenge

We also have a leading whitespace where we removed a number. How would we remove that whitespace?





Onwards!! Let’s break the texts down into individual words, so we can see what the most common words used are. We can use the base R function strsplit to do this, in this case we want to split our tweets into words by splitting on spaces.

strsplit(elon_tweets_df$text, split = " ")

Note that the output of this function is some horrible list object.

str(strsplit(elon_tweets_df$text, split = " "))

Luckily there is an unlist function which recursively will go through lists to simplify their elements into a vector. Let’s try it and check the structure of our output. We will save this to an object called ‘words’.

tail(words, 50)
 [1] "true"               "na"                 "na"                 "a"                  "href"               "relnofollowtwitter"
 [7] "for"                "iphonea"            "elonmusk"           "false"              "false"              "na"                
[13] "na"                 "several"            "tons"               "of"                 "force"              "on"                
[19] "each"               "fin"                "at"                 "high"               "angles"             "of"                
[25] "attack"             "amp"                "peak"               "heating"            "is"                 "the"               
[31] "cube"               "of"                 "speed"              "so"                 "a"                  "mach"              
[37] "r"                  "false"              "viss"               "true"               "a"                  "href"              
[43] "relnofollowtwitter" "for"                "iphonea"            "elonmusk"           "false"              "false"             
[49] "na"                 "na"                

Great! But… I noticed that we missed some \n (newline) and \t (tab) characters.


Challenge

Newline and tab characters are separating 2 words. Split these words apart and get rid of the newline character. Convert all of our character strings to lowercase (I haven’t shown you how to do this, but I believe in your google-fu). Check the first and last 50 words to see if anything else is amiss.





There are still a few problems with words cutoff like ‘solv’, or ‘flamethrower’ and ‘flamethrowers’ being the same word, or ‘north’ and ‘korea’ belonging together for context. If we were serious about this dataset we would need to resolve these issues. We also have some html and twitter-specific tags that we will deal with shortly. However, we are going to move ahead and count the number of occurences of each word and order them by frequency.

Wow. We have discovered people use prepositions and conjunctions, and words unrelated to content but html jargon, or things like ‘na’ and ‘false’. There is a list of ‘stop words’ that can be used to get rid of words that are unlikely to contain information for us as part of the tidytext package. However, we will have to add to this list.

The premade dataframe is called stop_words. We can save it as an object and add to it by making a dataframe of the words we want to add. We can call our lexicon ‘custom’.

stop <- bind_rows(stop, add_stop)
binding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vector

To remove these stop words from our list, we perform an anti-join (from Lesson 3).

words <- anti_join(data.frame(words), stop, by=c("words" = "word"))
Column `words`/`word` joining factor and character vector, coercing into character vector

‘boring’, ‘falcon’, ‘tesla’, ‘rocket’, ‘launch’,‘flamethrower’, ‘cars’, ‘spacex’, ‘tunnels’, and ‘mars’ and ‘ai’ are a bit further down. There are a few words that look like they should be added to the ‘stop words’ list (dont, doesnt, didnt, im), but we’ll work with this for now.

We can make a word cloud out of the top 50 words, which will be sized according to their frequency. I am starting with the first word after Elon Musk’s twitter handle. The default color is black, but we can use our viridis package (Lesson 3) to have a pleasing color palette.

library("wordcloud")
library("viridis")

words[2:51,] %>%
    with(wordcloud(words, n, ordered.colors = TRUE, colors = viridis(50), use.r.layout = TRUE))

Data Cleaning with stringr/stringi (AKA What is Trump up to anyways?)

We are going to do the exact same data cleaning with the stringr package using Trump’s tweets. The syntax is a little different, but it is pretty intuitive once you get started. All stringr functions can be found using str_ + Tab. Again, we will start by loading the dataset and looking at the top 5 favorite tweets.

iconv(trump_tweets_df$text, "UTF-8", "ASCII", sub="") %>% head()
[1] "The Caravan is largely broken up thanks to the strong immigration laws of Mexico and their willingness to use them " 
[2] "Still Rising: Rasmussen Poll Shows Donald Trump Approval Ratings Now at 51 Percent "                                 
[3] "Today we honor Dr. Martin Luther King, Jr. on the 50th anniversary of his assassination. Earlier this year I spoke " 
[4] "Our thoughts and prayers are with the four U.S. Marines from the 3rd Marine Aircraft Wing who lost their lives in y "
[5] "When youre already $500 Billion DOWN, you cant lose!"                                                                
[6] "We are not in a trade war with China, that war was lost many years ago by the foolish, or incompetent, people who r "

The first thing that we did was look for tags. The arguments are switched in stringr relative to the base functions. The first argument will be the character string we are searching, and the second argument will be the pattern we are matching. str_extract will return the index of the match, as well as the match. This is similar to grep when value = TRUE.

str_extract(string = trump_tweets_df$text, pattern = tags)
Error in type(pattern) : object 'tags' not found

str_detect is similar to grepl returning TRUE or FALSE if a match is or isn’t found, respectively.

str_detect(trump_tweets_df$text, tags)

Let’s be ambitious and try to remove tags, urls, newline and tab characters and numbers all in one go. str_remove automatically replaces the match with an empty character string.

trump_tweets_df$text[1:10]
 [1] "Crazy Joe Biden is trying to act like a tough guy Actually he is weak both mentally and physically and yet he t"       
 [2] "Lowest rated Oscars in HISTORY Problem is we dont have Stars anymore  except your President just kidding of course"    
 [3] "HAPPY EASTER"                                                                                                          
 [4] "THE SECOND AMENDMENT WILL NEVER BE REPEALED As much as Democrats would like to see this happen and despite the wo"     
 [5] "I will be strongly pushing Comprehensive Background Checks with an emphasis on Mental Health Raise age to  and e"      
 [6] "Kim Jong Un talked about denuclearization with the South Korean Representatives not just a freeze Also no missil"      
 [7] "Andrew McCabe FIRED a great day for the hard working men and women of the FBI  A great day for Democracy Sanctim"      
 [8] "THE HOUSE INTELLIGENCE COMMITTEE HAS AFTER A  MONTH LONG INDEPTH INVESTIGATION FOUND NO EVIDENCE OF COLLUSION"         
 [9] "Do you think the three UCLA Basketball Players will say thank you President Trump? They were headed for  years in jail"
[10] "I am considering a VETO of the Omnibus Spending Bill based on the fact that the  plus DACA recipients have b"          

stringr has its own function for trimming whitespace, str_trim, which you can use to specify whether you want leading or trailing whitespace trimmed, or both.

See how we have a couple extra spaces in the middle of some of our strings? str_squish will take care of that for us, leaving only a single space between words.

trump_tweets_df$text[1:10]
 [1] "crazy joe biden is trying to act like a tough guy actually he is weak both mentally and physically and yet he t"     
 [2] "lowest rated oscars in history problem is we dont have stars anymore except your president just kidding of course"   
 [3] "happy easter"                                                                                                        
 [4] "the second amendment will never be repealed as much as democrats would like to see this happen and despite the wo"   
 [5] "i will be strongly pushing comprehensive background checks with an emphasis on mental health raise age to and e"     
 [6] "kim jong un talked about denuclearization with the south korean representatives not just a freeze also no missil"    
 [7] "andrew mccabe fired a great day for the hard working men and women of the fbi a great day for democracy sanctim"     
 [8] "the house intelligence committee has after a month long indepth investigation found no evidence of collusion"        
 [9] "do you think the three ucla basketball players will say thank you president trump they were headed for years in jail"
[10] "i am considering a veto of the omnibus spending bill based on the fact that the plus daca recipients have b"         

All that’s left is to convert all characters to lowercase, and then we can see the top Trump words!

To get our tweets into a word list we use the similar function to strsplit, str_split, still splitting by the spaces betweenn words. The argument simplify = FALSE returns a list of character vectors which we then unlist.

unlist(str_split(trump_tweets_df$text, pattern = " ", simplify = FALSE))
   [1] "crazy"            "joe"              "biden"            "is"               "trying"          
   [6] "to"               "act"              "like"             "a"                "tough"           
  [11] "guy"              "actually"         "he"               "is"               "weak"            
  [16] "both"             "mentally"         "and"              "physically"       "and"             
  [21] "yet"              "he"               "t"                "lowest"           "rated"           
  [26] "oscars"           "in"               "history"          "problem"          "is"              
  [31] "we"               "dont"             "have"             "stars"            "anymore"         
  [36] "except"           "your"             "president"        "just"             "kidding"         
  [41] "of"               "course"           "happy"            "easter"           "the"             
  [46] "second"           "amendment"        "will"             "never"            "be"              
  [51] "repealed"         "as"               "much"             "as"               "democrats"       
  [56] "would"            "like"             "to"               "see"              "this"            
  [61] "happen"           "and"              "despite"          "the"              "wo"              
  [66] "i"                "will"             "be"               "strongly"         "pushing"         
  [71] "comprehensive"    "background"       "checks"           "with"             "an"              
  [76] "emphasis"         "on"               "mental"           "health"           "raise"           
  [81] "age"              "to"               "and"              "e"                "kim"             
  [86] "jong"             "un"               "talked"           "about"            "denuclearization"
  [91] "with"             "the"              "south"            "korean"           "representatives" 
  [96] "not"              "just"             "a"                "freeze"           "also"            
 [101] "no"               "missil"           "andrew"           "mccabe"           "fired"           
 [106] "a"                "great"            "day"              "for"              "the"             
 [111] "hard"             "working"          "men"              "and"              "women"           
 [116] "of"               "the"              "fbi"              "a"                "great"           
 [121] "day"              "for"              "democracy"        "sanctim"          "the"             
 [126] "house"            "intelligence"     "committee"        "has"              "after"           
 [131] "a"                "month"            "long"             "indepth"          "investigation"   
 [136] "found"            "no"               "evidence"         "of"               "collusion"       
 [141] "do"               "you"              "think"            "the"              "three"           
 [146] "ucla"             "basketball"       "players"          "will"             "say"             
 [151] "thank"            "you"              "president"        "trump"            "they"            
 [156] "were"             "headed"           "for"              "years"            "in"              
 [161] "jail"             "i"                "am"               "considering"      "a"               
 [166] "veto"             "of"               "the"              "omnibus"          "spending"        
 [171] "bill"             "based"            "on"               "the"              "fact"            
 [176] "that"             "the"              "plus"             "daca"             "recipients"      
 [181] "have"             "b"                "the"              "great"            "billy"           
 [186] "graham"           "is"               "dead"             "there"            "was"             
 [191] "nobody"           "like"             "him"              "he"               "will"            
 [196] "be"               "missed"           "by"               "christians"       "and"             
 [201] "all"              "religions"        "a"                "very"             "special"         
 [206] "man"              "the"              "fake"             "news"             "is"              
 [211] "beside"           "themselves"       "that"             "mccabe"           "was"             
 [216] "caught"           "called"           "out"              "and"              "fired"           
 [221] "how"              "many"             "hundreds"         "of"               "thousands"       
 [226] "of"               "alec"             "baldwin"          "whose"            "dying"           
 [231] "mediocre"         "career"           "was"              "saved"            "by"              
 [236] "his"              "terrible"         "impersonation"    "of"               "me"              
 [241] "on"               "snl"              "now"              "says"             "playing"         
 [246] "me"               "our"              "nation"           "was"              "founded"         
 [251] "by"               "farmers"          "our"              "independence"     "was"             
 [256] "won"              "by"               "farmers"          "and"              "our"             
 [261] "continent"        "was"              "tamed"            "by"               "farmers"         
 [266] "our"              "austin"           "bombing"          "suspect"          "is"              
 [271] "dead"             "great"            "job"              "by"               "law"             
 [276] "enforcement"      "and"              "all"              "concerned"        "so"              
 [281] "much"             "fake"             "news"             "never"            "been"            
 [286] "more"             "voluminous"       "or"               "more"             "inaccurate"      
 [291] "but"              "through"          "it"               "all"              "our"             
 [296] "country"          "is"               "doing"            "great"            "while"           
 [301] "in"               "the"              "philippines"      "i"                "was"             
 [306] "forced"           "to"               "watch"            "which"            "i"               
 [311] "have"             "not"              "done"             "in"               "months"          
 [316] "and"              "again"            "realized"         "how"              "bad"             
 [321] "and"              "fake"             "it"               "is"               "loser"           
 [326] "i"                "called"           "president"        "putin"            "of"              
 [331] "russia"           "to"               "congratulate"     "him"              "on"              
 [336] "his"              "election"         "victory"          "in"               "past"            
 [341] "obama"            "called"           "him"              "also"             "th"              
 [346] "why"              "did"              "the"              "obama"            "administration"  
 [351] "start"            "an"               "investigation"    "into"             "the"             
 [356] "trump"            "campaign"         "with"             "zero"             "proof"           
 [361] "of"               "wrongdoing"       "lon"              "border"           "patrol"          
 [366] "agents"           "are"              "not"              "allowed"          "to"              
 [371] "properly"         "do"               "their"            "job"              "at"              
 [376] "the"              "border"           "because"          "of"               "ridiculous"      
 [381] "liberal"          "democrat"         "spent"            "very"             "little"          
 [386] "time"             "with"             "andrew"           "mccabe"           "but"             
 [391] "he"               "never"            "took"             "notes"            "when"            
 [396] "he"               "was"              "with"             "me"               "i"               
 [401] "dont"             "believe"          "he"               "made"             "mem"             
 [406] "the"              "united"           "states"           "has"              "an"              
 [411] "billion"          "dollar"           "yearly"           "trade"            "deficit"         
 [416] "because"          "of"               "our"              "very"             "stupid"          
 [421] "trade"            "deals"            "and"              "poli"             "i"               
 [426] "have"             "stated"           "my"               "concerns"         "with"            
 [431] "amazon"           "long"             "before"           "the"              "election"        
 [436] "unlike"           "others"           "they"             "pay"              "little"          
 [441] "or"               "no"               "taxes"            "to"               "state"           
 [446] "people"           "are"              "angry"            "at"               "some"            
 [451] "point"            "the"              "justice"          "department"       "and"             
 [456] "the"              "fbi"              "must"             "do"               "what"            
 [461] "is"               "right"            "and"              "proper"           "the"             
 [466] "american"         "public"           "deserves"         "it"               "from"            
 [471] "bush"             "to"               "present"          "our"              "country"         
 [476] "has"              "lost"             "more"             "than"             "factories"       
 [481] "manufacturing"    "jobs"             "and"              "accumulat"        "the"             
 [486] "deal"             "with"             "north"            "korea"            "is"              
 [491] "very"             "much"             "in"               "the"              "making"          
 [496] "and"              "will"             "be"               "if"               "completed"       
 [501] "a"                "very"             "good"             "one"              "for"             
 [506] "the"              "world"            "time"             "californias"      "sanctuary"       
 [511] "policies"         "are"              "illegal"          "and"              "unconstitutional"
 [516] "and"              "put"              "the"              "safety"           "and"             
 [521] "security"         "of"               "our"              "entire"           "nati"            
 [526] "why"              "does"             "the"              "mueller"          "team"            
 [531] "have"             "hardened"         "democrats"        "some"             "big"             
 [536] "crooked"          "hillary"          "supporters"       "and"              "zero"            
 [541] "republicans"      "an"               "ms"               "gang"             "members"         
 [546] "are"              "being"            "removed"          "by"               "our"             
 [551] "great"            "ice"              "and"              "border"           "patrol"          
 [556] "agents"           "by"               "the"              "thousands"        "but"             
 [561] "these"            "killers"          "history"          "shows"            "that"            
 [566] "a"                "school"           "shooting"         "lasts"            "on"              
 [571] "average"          "minutes"          "it"               "takes"            "police"          
 [576] "&amp;"            "first"            "responders"       "approxima"        "mexico"          
 [581] "is"               "doing"            "very"             "little"           "if"              
 [586] "not"              "nothing"          "at"               "stopping"         "people"          
 [591] "from"             "flowing"          "into"             "mexico"           "through"         
 [596] "their"            "southern"         "bor"              "if"               "the"             
 [601] "eu"               "wants"            "to"               "further"          "increase"        
 [606] "their"            "already"          "massive"          "tariffs"          "and"             
 [611] "barriers"         "on"               "us"               "companies"        "doing"           
 [616] "business"         "t"                "when"             "a"                "country"         
 [621] "taxes"            "our"              "products"         "coming"           "in"              
 [626] "at"               "say"              "and"              "we"               "tax"             
 [631] "the"              "same"             "product"          "coming"           "into"            
 [636] "our"              "country"          "at"               "ze"               "mike"            
 [641] "pompeo"           "director"         "of"               "the"              "cia"             
 [646] "will"             "become"           "our"              "new"              "secretary"       
 [651] "of"               "state"            "he"               "will"             "do"              
 [656] "a"                "fantastic"        "job"              "thank"            "you"             
 [661] "to"               "the"              "long"             "anticipated"      "release"         
 [666] "of"               "the"              "jfkfiles"         "will"             "take"            
 [671] "place"            "tomorrow"         "so"               "interesting"      "possible"        
 [676] "progress"         "being"            "made"             "in"               "talks"           
 [681] "with"             "north"            "korea"            "for"              "the"             
 [686] "first"            "time"             "in"               "many"             "years"           
 [691] "a"                "serious"          "effort"           "is"               "being"           
 [696] "great"            "briefing"         "this"             "afternoon"        "on"              
 [701] "the"              "start"            "of"               "our"              "southern"        
 [706] "border"           "wall"             "we"               "are"              "on"              
 [711] "the"              "losing"           "side"             "of"               "almost"          
 [716] "all"              "trade"            "deals"            "our"              "friends"         
 [721] "and"              "enemies"          "have"             "taken"            "advantage"       
 [726] "of"               "the"              "us"               "for"              "m"               
 [731] "because"          "of"               "the"              "&amp;"            "billion"         
 [736] "dollars"          "gotten"           "to"               "rebuild"          "our"             
 [741] "military"         "many"             "jobs"             "are"              "created"         
 [746] "and"              "our"              "military"         "i"                "question"        
 [751] "if"               "all"              "of"               "the"              "russian"         
 [756] "meddling"         "took"             "place"            "during"           "the"             
 [761] "obama"            "administration"   "right"            "up"               "to"              
 [766] "january"          "th"               "why"              "as"               "the"             
 [771] "house"            "intelligence"     "committee"        "has"              "concluded"       
 [776] "there"            "was"              "no"               "collusion"        "between"         
 [781] "russia"           "and"              "the"              "trump"            "campaign"        
 [786] "as"               "we"               "are"              "not"              "in"              
 [791] "a"                "trade"            "war"              "with"             "china"           
 [796] "that"             "war"              "was"              "lost"             "many"            
 [801] "years"            "ago"              "by"               "the"              "foolish"         
 [806] "or"               "incompetent"      "people"           "who"              "r"               
 [811] "dont"             "focus"            "on"               "me"               "focus"           
 [816] "on"               "the"              "destructive"      "radical"          "islamic"         
 [821] "terrorism"        "that"             "is"               "taking"           "place"           
 [826] "within"           "th"               "its"              "march"            "th"              
 [831] "and"              "the"              "democrats"        "are"              "nowhere"         
 [836] "to"               "be"               "found"            "on"               "daca"            
 [841] "gave"             "them"             "months"           "they"             "just"            
 [846] "dont"             "care"             "where"            "a"                "for"             
 [851] "years"            "and"              "through"          "many"             "administrations" 
 [856] "everyone"         "said"             "that"             "peace"            "and"             
 [861] "the"              "denuclearization" "of"               "the"              "korean"          
 [866] "peninsu"          "school"           "shooting"         "survivor"         "says"            
 [871] "he"               "quit"             "town"             "hall"             "after"           
 [876] "refusing"         "scripted"         "question"         "just"             "like"            
 [881] "rest"             "in"               "peace"            "billy"            "graham"          
 [886] "a"                "total"            "witch"            "hunt"             "with"            
 [891] "massive"          "conflicts"        "of"               "interest"         "my"              
 [896] "administration"   "stands"           "in"               "solidarity"       "with"            
 [901] "the"              "brave"            "citizens"         "in"               "orange"          
 [906] "county"           "defending"        "their"            "rights"           "against"         
 [911] "cali"             "got"              "billion"          "to"               "start"           
 [916] "wall"             "on"               "southern"         "border"           "rest"            
 [921] "will"             "be"               "forthcoming"      "most"             "importantly"     
 [926] "got"              "billion"          "to"               "if"               "a"               
 [931] "potential"        "sicko"            "shooter"          "knows"            "that"            
 [936] "a"                "school"           "has"              "a"                "large"           
 [941] "number"           "of"               "very"             "weapons"          "talented"        
 [946] "teachers"         "and"              "ot"               "rasmussen"        "and"             
 [951] "others"           "have"             "my"               "approval"         "ratings"         
 [956] "at"               "around"           "which"            "is"               "higher"          
 [961] "than"             "obama"            "and"              "yet"              "the"             
 [966] "political"        "pund"             "my"               "twitter"          "account"         
 [971] "was"              "taken"            "down"             "for"              "minutes"         
 [976] "by"               "a"                "rogue"            "employee"         "i"               
 [981] "guess"            "the"              "word"             "must"             "finally"         
 [986] "be"               "getting"          "outand"           "having"           "an"              
 [991] "impact"           "i"                "want"             "to"               "encourage"       
 [996] "all"              "of"               "my"               "many"             "texas"           
 [ reached getOption("max.print") -- omitted 7774 entries ]

We can now do our anti_join to remove ‘stop words’, and tally our remaining words and order them as before.

words <- anti_join(data.frame(words), stop, by=c("words" = "word"))
Column `words`/`word` joining factor and character vector, coercing into character vector

Hmmm… it looks like we have those html tags in a different format. It’s interesting to note these little variations because no matter how much you try to automate your analysis there is always going to be something from your new dataset that didn’t fit with your old dataset. This is why we need these data wrangling skills. Even though some packages may have been created to help us on our way, they can’t possibly cover every case. And they all work slighly differently.


words %>% count(words) %>% arrange(desc(n))
Error in arrange_impl(.data, dots) : 
  cannot arrange column of class 'function' at position 1

‘president’, ‘people’, ‘fake’, ‘news’, ‘daca’, democrats’, ‘jobs’, ‘obama’, ‘border’, ‘fbi’, ‘collusion’, ‘russia’, ‘wall’, ‘mexico’ and further down is ‘bad’, ‘crooked’ and ‘hillary’.

Trump’s wordcloud minus his twitter handle.


Challenge

Pick one of the other tweet data sets [insert possibilities]. Clean it. Remove all of the stop words. Make a wordcloud of the top 50 words.





-rounding data - we don’t need it to the umpteenth decimal -look for lowest and highest values - do these make sense? (standard deviation?) -called a range check, spell check, regex -document what you are doing -no all data sets are 100% clean -also the paste functions

-the basics running through a fun example - do a chosen one, see what other problems come up -rmarkdown, syntax, etc -make a pdf of cleaning the WellcomeTrust dataset (give a brief outline of what needs to be done)

remember regex testers https://regex101.com/ https://regexr.com/

-regexpr, str_locate



save <- str_extract_all(trump_tweets_df$text, "[#|@|[:alnum:]]+([^\\s][[:alnum:]]+)?", simplify = TRUE)
#gather, get rid of empty strings
test <- gather(as.data.frame(save), value = "word") %>% filter(word != "")

It looks like we need some more data cleaning. First, let’s get rid of everything with numbers.

trump_words %>% select(word) %>% str_remove("[0-9]+", simplify)

#removes numbers
str_remove_all(trump_words$word, "[0-9]*")
#still have punctuation before numbers
str_remove_all(trump_words$word, "[0-9].*")

It’s looking better. We have a single hashtag. “‘s" endings should be removed - could match other words, or if a contraction will be removed via stopwords list. There is also a ’u.s’ where we can get rid of the period. If anyone can find out how to remove the apostrophe and not the period, let me know.

str_remove_all(trump_words$word, "#$")


gsub("[[:punct:]]s$", "", trump_words$word)
#let's check to see what this will remove.
grep("[[:punct:]]s$", trump_words$word, value=TRUE)
#the only thing that isn't an 's is u.s, we don't want this removed and truncated to 'u', but we also don't want to just remove the period first because we want to retain that it means united states and not us. So, I actually couldn't find a regex punctuation solution to this. BONUS points if you do. Instead, we are going to REPLACE "u.s" with "usa"

trump_words$word <- str_replace(trump_words$word, "u.s", "usa")

#check
grep("[[:punct:]]s$", trump_words$word, value=TRUE)

trump_words$word <- str_remove_all(trump_words$word, "[[:punct:]]s$")
trump_words$word <- str_remove_all(trump_words$word, "#$")

#check
grep("[[:punct:]]s$", trump_words$word, value=TRUE)


#once we know we've got it right we can filter the data frame
trump_words <- trump_words %>% mutate(word = str_remove_all(trump_words$word, "[0-9].*")) %>% filter(word != "")

I looked for a dataset for data cleaning and found it in a blog titled “Biologists: this is why bioinformaticians hate you…”. The main and common issue with this dataset is that when data entry was done there was no structured vocabulary - meaning that people could type in whatever they wanted instead of using dropdown menus with limited options, or giving an error if something is formatted incorrectly, or stipulating some rules (ie. must be all lowercase, uppercase, no numbers, spacing, etc.). I must admit I have been guilty of messing with people who have made databases without rules. For example, in giving the emergency contact there was a line to input ‘Relationship’, which could easily have been a dropdown menu ‘parent, partner, friend, other’, but instead I was allowed to write in a free text line ‘lifelong kindred spirit, soulmate and doggy-daddy’. I don’t think anyone here was trying to be a nuisance, this is just a consequence of poor data collection. There is a README file to go with this spreadsheet if you have questions about the data fields.

http://www.opiniomics.org/biologists-this-is-why-bioinformaticians-hate-you/
https://figshare.com/articles/Wellcome_Trust_APC_spend_2012_13_data_file/963054

What I want to know is: 1. List 5 problems with this data set. 1. Which publisher is the most expensive to publish with? 1. Which journal is the most expensive to publish with? Is this by the same publisher?
1. Convert sterling to CAD. What is the median cost of publishing with Elsevier in CAD?

The blogger’s opinion of cleaning this dataset:

‘I now have no hair left; I’ve torn it all out. My teeth are just stumps from excessive gnashing. My faith in humanity has been destroyed!’

Don’t get to this point. The dataset doesn’t need to be perfect. Just do what you gotta do to answer these questions.

Note to self: This may be too tough - see how long it takes to do.

Approximate time: 2 hours per lesson

Each lesson will have:


R markdown and knitr

Challenge:

Take the original gapminder dataset and covert it to the ‘clean’ dataset found in the gapminder package / find some horrible dataset to clean. Present in a knitr table, explaining some of your data cleaning challenges in rmarkdown. Knit the document to a pdf.

Resources:
http://stat545.com/block022_regular-expression.html http://stat545.com/block027_regular-expressions.html http://stat545.com/block028_character-data.html
http://r4ds.had.co.nz/strings.html http://www.gastonsanchez.com/Handling_and_Processing_Strings_in_R.pdf
http://varianceexplained.org/r/trump-tweets/
http://www.opiniomics.org/biologists-this-is-why-bioinformaticians-hate-you/
https://figshare.com/articles/Wellcome_Trust_APC_spend_2012_13_data_file/963054

Post-Lesson Assessment


Questions

Notes

LS0tCnRpdGxlOiAiTGVzc29uIDQgLSBEYXRhIENsZWFuaW5nL1N0b3AgV3Jlc3RsaW5nIHdpdGggUmVndWxhciBFeHByZXNzaW9ucyIKb3V0cHV0OiAKICBodG1sX2RvY3VtZW50OgogICAgICAgICAga2VlcF9tZDogeWVzCiAgICAgICAgICB0b2M6IFRSVUUKICAgICAgICAgIHRvY19kZXB0aDogMwogIGh0bWxfbm90ZWJvb2s6CiAgICAgICAgICB0b2M6IFRSVUUKICAgICAgICAgIHRvY19kZXB0aDogMwotLS0KKioqCiFbXShpbWcvYmlnLWRhdGEtYm9yYXQucG5nKXt3aWR0aD00MDBweH0gCgo8L2JyPgoKIyNBIHF1aWNrIGludHJvIHRvIHRoZSBpbnRybyB0byBSIExlc3NvbiBTZXJpZXMKCjwvYnI+CgpUaGlzICdJbnRybyB0byBSIExlc3NvbiBTZXJpZXMnIGlzIGJyb3VnaHQgdG8geW91IGJ5IHRoZSBDZW50cmUgZm9yIHRoZSBBbmFseXNpcyBvZiBHZW5vbWUgRXZvbHV0aW9uICYgRnVuY3Rpb24ncyAoQ0FHRUYpIGJpb2luZm9ybWF0aWNzIHRyYWluaW5nIGluaXRpYXRpdmUuIFRoaXMgY291cnNlIHdhcyBkZXZlbG9wZWQgYmFzZWQgb24gZmVlZGJhY2sgb24gdGhlIG5lZWRzIGFuZCBpbnRlcmVzdHMgb2YgdGhlIERlcGFydG1lbnQgb2YgQ2VsbCAmIFN5c3RlbXMgQmlvbG9neSBhbmQgdGhlIERlcGFydG1lbnQgb2YgRWNvbG9neSBhbmQgRXZvbHV0aW9uYXJ5IEJpb2xvZ3kuIAoKCgpUaGlzIGxlc3NvbiBpcyB0aGUgdGhpcmQgaW4gYSA2LXBhcnQgc2VyaWVzLiBUaGUgaWRlYSBpcyB0aGF0IGF0IHRoZSBlbmQgb2YgdGhlIHNlcmllcywgeW91IHdpbGwgYmUgYWJsZSB0byBpbXBvcnQgYW5kIG1hbmlwdWxhdGUgeW91ciBkYXRhLCBtYWtlIGV4cGxvcmF0b3J5IHBsb3RzLCBwZXJmb3JtIHNvbWUgYmFzaWMgc3RhdGlzdGljYWwgdGVzdHMsIHRlc3QgYSByZWdyZXNzaW9uIG1vZGVsLCBhbmQgbWFrZSBzb21lIGV2ZW4gcHJldHRpZXIgcGxvdHMgYW5kIGRvY3VtZW50cyB0byBzaGFyZSB5b3VyIHJlc3VsdHMuIAoKCiFbXShpbWcvZGF0YS1zY2llbmNlLWV4cGxvcmUucG5nKQoKPC9icj4KCkhvdyBkbyB3ZSBnZXQgdGhlcmU/IFRvZGF5IHdlIGFyZSBnb2luZyB0byBiZSBsZWFybmluZyBob3cgdG8gbWFrZSBhbGwgc29ydHMgb2YgcGxvdHMgLSBmcm9tIHNpbXBsZSBkYXRhIGV4cGxvcmF0aW9uIHRvIGludGVyYWN0aXZlIHBsb3RzLlRoZSBuZXh0IGxlc3NvbiB3aWxsIGJlIGRhdGEgY2xlYW5pbmcgYW5kIHN0cmluZyBtYW5pcHVsYXRpb247IHRoaXMgaXMgcmVhbGx5IHRoZSBiYXR0bGVncm91bmQgb2YgY29kaW5nIC0gZ2V0dGluZyB5b3VyIGRhdGEgaW50byB0aGUgZm9ybWF0IHdoZXJlIHlvdSBjYW4gYW5hbHlzZSBpdC4gVGhlbiB3ZSB3aWxsIGxlYXJuIGhvdyB0byBkbyB0LXRlc3RzIGFuZCBwZXJmb3JtIHJlZ3Jlc3Npb24gYW5kIG1vZGVsaW5nIGluIFIuIEFuZCBsYXN0bHksIHdlIHdpbGwgbGVhcm4gdG8gd3JpdGUgc29tZSBmdW5jdGlvbnMsIHdoaWNoIHJlYWxseSBjYW4gc2F2ZSB5b3UgdGltZSBhbmQgaGVscCBzY2FsZSB1cCB5b3VyIGFuYWx5c2VzLgoKCiFbXShpbWcvc3BvdGlmeS1ob3d0b2J1aWxkbXZwLmdpZikKCjwvYnI+CgpUaGUgc3RydWN0dXJlIG9mIHRoZSBjbGFzcyBpcyBhIGNvZGUtYWxvbmcgc3R5bGUuIEl0IGlzIGhhbmRzIG9uLiBUaGUgbGVjdHVyZSBBTkQgY29kZSB3ZSBhcmUgZ29pbmcgdGhyb3VnaCBhcmUgYXZhaWxhYmxlIG9uIEdpdEh1YiBmb3IgZG93bmxvYWQgYXQgaHR0cHM6Ly9naXRodWIuY29tL2VhY3Rvbi9DQUdFRiBfXyhOb3RlOiByZXBvIGlzIHByaXZhdGUgdW50aWwgYXBwcm92ZWQpX18sIHNvIHlvdSBjYW4gc3BlbmQgdGhlIHRpbWUgY29kaW5nIGFuZCBub3QgdGFraW5nIG5vdGVzLiBBcyB3ZSBnbyBhbG9uZywgdGhlcmUgd2lsbCBiZSBzb21lIGNoYWxsZW5nZSBxdWVzdGlvbnMgYW5kIG11bHRpcGxlIGNob2ljZSBxdWVzdGlvbnMgb24gU29jcmF0aXZlLiBBdCB0aGUgZW5kIG9mIHRoZSBjbGFzcyBpZiB5b3UgY291bGQgcGxlYXNlIGZpbGwgb3V0IGEgcG9zdC1sZXNzb24gc3VydmV5IChodHRwczovL3d3dy5zdXJ2ZXltb25rZXkuY29tL3IvVk5RWjNLUyksIGl0IHdpbGwgaGVscCBtZSBmdXJ0aGVyIGRldmVsb3AgdGhpcyBjb3Vyc2UgYW5kIHdvdWxkIGJlIGdyZWF0bHkgYXBwcmVjaWF0ZWQuIAoKKioqCgojIyMjUGFja2FnZXMgVXNlZCBpbiBUaGlzIExlc3NvbgoKVGhlIGZvbGxvd2luZyBwYWNrYWdlcyBhcmUgdXNlZCBpbiB0aGlzIGxlc3NvbjoKCmB0aWR5dmVyc2VgIChgZ2dwbG90MmAsIGB0aWR5cmAsIGBkcGx5cmApICAgICAKKGB0d2l0dGVSYCkqCmB0aWR5dGV4dGAgICAgIApgdmlyaWRpc2AgICAgIAoKKlVzZWQgdG8gZ2VuZXJhdGUgdGhlIHR3ZWV0IHRhYmxlcyB1c2VkIGluIHRoaXMgbGVzc29uLiBJdCBpcyBub3QgbmVjZXNzYXJ5IGZvciB5b3UgdG8gaW5zdGFsbCB0aGlzIC0geW91IGNhbiB3b3JrIGZyb20gdGhlIHRhYmxlcy4gSWYgeW91IHdhbnQgdG8gY3JlYXRlIHRoZXNlIGZpbGVzIC0gdGhlIGNvZGUgaXMgaGVyZSAuLi4uLi4uLi4uLi4uLi4uKGluc2VydCBsaW5rKS4gICAgCgpQbGVhc2UgaW5zdGFsbCBhbmQgbG9hZCB0aGVzZSBwYWNrYWdlcyBmb3IgdGhlIGxlc3Nvbi4gSW4gdGhpcyBkb2N1bWVudCBJIHdpbGwgbG9hZCBlYWNoIHBhY2thZ2Ugc2VwYXJhdGVseSwgYnV0IEkgd2lsbCBub3QgYmUgcmVtaW5kaW5nIHlvdSB0byBpbnN0YWxsIHRoZSBwYWNrYWdlLiBSZW1lbWJlcjogdGhlc2UgcGFja2FnZXMgbWF5IGJlIGZyb20gQ1JBTiBPUiBCaW9jb25kdWN0b3IuIAoKKioqCiMjIyNIaWdobGlnaHRpbmcKCmBncmV5IGJhY2tncm91bmRgIC0gYSBwYWNrYWdlIG9yIGZ1bmN0aW9uIG9yIGNvZGUgICAgICAKKml0YWxpY3MqIC0gYW4gaW1wb3J0YW50IHRlcm0gb3IgY29uY2VwdCAgICAgCioqYm9sZCoqIC0gaGVhZGluZyBvciAnZ3JhbW1hciBvZiBncmFwaGljcycgdGVybSAgICAgIAoKKioqCl9fT2JqZWN0aXZlOl9fIEF0IHRoZSBlbmQgb2YgdGhpcyBzZXNzaW9uIHlvdSB3aWxsIGJlIGFibGUgdG8gdXNlIHJlZ3VsYXIgZXhwcmVzc2lvbnMgdG8gJ2NsZWFuJyB5b3VyIGRhdGEuIFlvdSB3aWxsIGFsc28gbGVhcm4gUiBtYXJrZG93biBhbmQgYmUgYWJsZSB0byByZW5kZXIgeW91ciBSIGNvZGUgaW50byBzbGlkZXMsIGEgcGRmLCBodG1sLCBhIHdvcmQgZG9jdW1lbnQsIG9yIGEgbm90ZWJvb2suCgoqKioKCiMjRGF0YSBDbGVhbmluZyBvciBEYXRhIE11bmdpbmcgb3IgRGF0YSBXcmFuZ2xpbmcKCldoeSBkbyB3ZSBuZWVkIHRvIGRvIHRoaXM/CgonUmF3JyBkYXRhIGlzIHNlbGRvbSAobmV2ZXIpIGluIGEgdXNlYWJsZSBmb3JtYXQuIERhdGEgaW4gdHV0b3JpYWxzIG9yIGRlbW9zIGhhcyBhbHJlYWR5IGJlZW4gbWV0aWN1bG91c2x5IGZpbHRlcmVkLCB0cmFuc2Zvcm1lZCBhbmQgcmVhZGllZCBmb3IgdGhhdCBzcGVjaWZpYyBhbmFseXNpcy4gSG93IG1hbnkgcGVvcGxlIGhhdmUgZG9uZSBhIHR1dG9yaWFsIG9ubHkgdG8gZmluZCB0aGV5IGNhbid0IGdldCB0aGVpciBvd24gZGF0YSBpbiB0aGUgQColKCQhIGZvcm1hdCB0byB1c2UgdGhlIHRvb2wgdGhleSBoYXZlIGp1c3Qgc3BlbmQgYW4gaG91ciBsZWFybmluZyBhYm91dD8/PwoKRGF0YSBjbGVhbmluZyByZXF1aXJlcyB1cyB0bzoKCi0gZ2V0IHJpZCBvZiBpbmNvbnNpc3RlbmNpZXMgaW4gb3VyIGRhdGEuIAotIGhhdmUgbGFiZWxzIHRoYXQgbWFrZSBzZW5zZS4gCi0gY2hlY2sgZm9yIGludmFsaWQgY2hhcmFjdGVyL251bWVyaWMgdmFsdWVzLgotIGNoZWNrIGZvciBpbmNvbXBsZXRlIGRhdGEuCi0gcmVtb3ZlIGRhdGEgd2UgZG8gbm90IG5lZWQuCi0gZ2V0IG91ciBkYXRhIGluIGEgcHJvcGVyIGZvcm1hdCB0byBiZSBhbmFseXplZCBieSB0aGUgdG9vbHMgd2UgYXJlIHVzaW5nLiAKLSBmbGFnL3JlbW92ZSBkYXRhIHRoYXQgZG9lcyBub3QgbWFrZSBzZW5zZS4KClNvbWUgZGVmaW5pdGlvbnMgbWlnaHQgdGFrZSB0aGlzIGEgYml0IGZhcnRoZXIgYW5kIGluY2x1ZGUgbm9ybWFsaXppbmcgZGF0YSBhbmQgcmVtb3Zpbmcgb3V0bGllcnMsIGJ1dCBJIGNvbnNpZGVyIGRhdGEgY2xlYW5pbmcgYXQgbGVhc3QsIGdldHRpbmcgZGF0YSBpbnRvIGEgZm9ybWF0IHdoZXJlIHdlIGNhbiBzdGFydCBhY3RpdmVseSBkb2luZyAndGhlIG1hdGhzIG9yIHRoZSBncmFwaHMnIHdoZXRoZXIgaXQgYmUgc3RhdGlzdGljYWwgY2FsY3VsYXRpb25zLCBub3JtYWxpemF0aW9uIG9yIGV4cGxvcmF0b3J5IHBsb3RzLiBXZSBoYXZlIGxlYXJuZWQgaG93IHRvIHRyYW5zZm9ybSBkYXRhIGludG8gYSB0aWR5IGZvcm1hdCBpbiBMZXNzb24gMiwgYnV0IHRoZSBwcmVsdWRlIHRvIHRyYW5zZm9ybWluZyBkYXRhIGlzIGRvaW5nIHRoZSBncnVudCB3b3JrIG1lbnRpb25lZCBhYm92ZS4gU28gbGV0J3MgZ2V0IHRvIGl0IQoKIVtdKGltZy9jbGVhbmluZy5naWYpCjwvYnI+CgoKCgojI0ludHJvIHRvIHJlZ3VsYXIgZXhwcmVzc2lvbnMKCgoqKlJlZ3VsYXIgZXhwcmVzc2lvbnMqKgoKIkEgR29kLWF3ZnVsIGFuZCBwb3dlcmZ1bCBsYW5ndWFnZSBmb3IgZXhwcmVzc2luZyBwYXR0ZXJucyB0byBtYXRjaCBpbiB0ZXh0IG9yIGZvciBzZWFyY2gtYW5kLXJlcGxhY2UuIEZyZXF1ZW50bHkgZGVzY3JpYmVkIGFzICd3cml0ZSBvbmx5JywgYmVjYXVzZSByZWd1bGFyIGV4cHJlc3Npb25zIGFyZSBlYXNpZXIgdG8gd3JpdGUgdGhhbiB0byByZWFkL3VuZGVyc3RhbmQuIEFuZCB0aGV5IGFyZSBub3QgcGFydGljdWxhcmx5IGVhc3kgdG8gd3JpdGUuIgogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAtIEplbm55IEJyeWFuCgohW10oaW1nL3hrY2QtMTE3MS1wZXJsX3Byb2JsZW1zLnBuZykKCjwvYnI+CgpTbyB3aHkgZG9lcyByZWdleCBnZXQgc28gbXVjaCBmbGFrPwoKU2NhcnkgZXhhbXBsZTogaG93IHRvIGdldCBhbiBlbWFpbCBpbiBkaWZmZXJlbnQgcHJvZ3JhbW1pbmcgbGFuZ3VhZ2VzIChodHRwOi8vZW1haWxyZWdleC5jb20vKS4gV2hhdGV2ZXIuIAoKCiFbXShpbWcvODAxNzBjMTE5OTZiZDU4ZTQyMmRiYjY2MzFiNzNjNGIuanBnKSAKCiFbXShpbWcvcmVnZXhieXRyaWFsYW5kZXJyb3ItYmlnLXNtYWxsZXIucG5nKQoKPC9icj4KCldoYXQgZG9lcyB0aGUgbGFuZ3VhZ2UgbG9vayBsaWtlPwoKCl9NYXRjaGluZyBieSBwb3NpdGlvbl8KCldoZXJlIGlzIHRoZSBjaGFyYWN0ZXIgaW4gdGhlIHN0cmluZz8KCmBgYHtyIGVjaG8gPSBGQUxTRSwgZXZhbCA9IFRSVUUsIHdhcm5pbmcgPSBGQUxTRX0KbGlicmFyeShrbml0cikKbGlicmFyeShrYWJsZUV4dHJhKQoKdGV4dF90YWJsZSA8LSBkYXRhLmZyYW1lKAogIEV4cHJlc3Npb24gPSBjKCJeIiwgIiQiLCAiXFxiIiwgIlxcQiIpLAogIE1lYW5pbmcgPSBjKCJzdGFydCBvZiBzdHJpbmciLCAiZW5kIG9mIHN0cmluZyIsICJlbXB0eSBzdHJpbmcgYXQgZWl0aGVyIGVkZ2Ugb2YgYSB3b3JkIiwgImVtcHR5IHN0cmluZyB0aGF0IGlzIE5PVCBhdCB0aGUgZWRnZSBvZiBhIHdvcmQiKQopCgprYWJsZSh0ZXh0X3RhYmxlLCAiaHRtbCIpICU+JQogIGthYmxlX3N0eWxpbmcoZnVsbF93aWR0aCA9IEYpICU+JQogIGNvbHVtbl9zcGVjKDEsIGl0YWxpYyA9IFQsIGJvcmRlcl9yaWdodCA9IFQpICU+JQogIGNvbHVtbl9zcGVjKDIsIHdpZHRoID0gIjQwZW0iKQpgYGAKCgoKX1F1YW50aWZpZXJzXwoKSG93IG1hbnkgdGltZXMgd2lsbCBhIGNoYXJhY3RlciBhcHBlYXI/CgpgYGB7ciBlY2hvID0gRkFMU0UsIGV2YWwgPSBUUlVFLCB3YXJuaW5nID0gRkFMU0V9CnRleHRfdGFibGUgPC0gZGF0YS5mcmFtZSgKICBFeHByZXNzaW9uID0gYygiPyIsICIqIiwiKyIsICJ7bn0iLCAie24sfSIsICJ7LG59IiwgIntuLG19IiksCiAgTWVhbmluZyA9IGMoIjAgb3IgMSIsICIwIG9yIG1vcmUiLCAiMSBvciBtb3JlIiwgImV4YWN0bHkgbiIsICJhdCBsZWFzdCBuIiwgImF0IG1vc3QgbiIsICJiZXR3ZWVuIG4gYW5kIG0gKGluY2x1c2l2ZSkiKQopCgprYWJsZSh0ZXh0X3RhYmxlLCAiaHRtbCIpICU+JQogIGthYmxlX3N0eWxpbmcoZnVsbF93aWR0aCA9IEYpICU+JQogIGNvbHVtbl9zcGVjKDEsIGl0YWxpYyA9IFQsIGJvcmRlcl9yaWdodCA9IFQpICU+JQogIGNvbHVtbl9zcGVjKDIsIHdpZHRoID0gIjQwZW0iKQpgYGAKCgpfQ2xhc3Nlc18KCldoYXQga2luZCBvZiBjaGFyYWN0ZXIgaXMgaXQ/CgpgYGB7ciBlY2hvID0gRkFMU0UsIGV2YWwgPSBUUlVFLCB3YXJuaW5nID0gRkFMU0V9CnRleHRfdGFibGUgPC0gZGF0YS5mcmFtZSgKICBFeHByZXNzaW9uID0gYygiXFx3LCBbQS16MC05XSwgW1s6YWxudW06XV0iLCAiXFxkLCBbMC05XSwgW1s6ZGlnaXQ6XV0iLCAiW0Etel0sIFs6YWxwaGE6XSIsICJcXHMsIFtbOnNwYWNlOl1dIiwgIltbOnB1bmN0Ol1dIiwgIltbOmxvd2VyOl1dIiwgIltbOnVwcGVyOl1dIiwgIlxcVywgW15BLXowLTldIiwgIlxcUyIsICJcXEQsIFteMC05XSIpLAogIE1lYW5pbmcgPSBjKCJ3b3JkIGNoYXJhY3RlcnMgKGxldHRlcnMgKyBkaWdpdHMpIiwgImRpZ2l0cyIsICJhbHBoYWJldGljYWwgY2hhcmFjdGVycyIsICJzcGFjZSIsICJwdW5jdHVhdGlvbiIsICJsb3dlcmNhc2UiLCAidXBwZXJjYXNlIiwgIm5vdCB3b3JkIGNoYXJhY3RlcnMiLCAibm90IHNwYWNlIiwgIm5vdCBkaWdpdHMiKQopCgprYWJsZSh0ZXh0X3RhYmxlLCAiaHRtbCIpICU+JQogIGthYmxlX3N0eWxpbmcoZnVsbF93aWR0aCA9IEYpICU+JQogIGNvbHVtbl9zcGVjKDEsIGl0YWxpYyA9IFQsIGJvcmRlcl9yaWdodCA9IFQpICU+JQogIGNvbHVtbl9zcGVjKDIsIHdpZHRoID0gIjQwZW0iKQpgYGAKCgpfT3BlcmF0b3JzXwoKSGVscGVyIGFjdGlvbnMgdG8gbWF0Y2ggeW91ciBjaGFyYWN0ZXJzLgoKYGBge3IgZWNobyA9IEZBTFNFLCBldmFsID0gVFJVRSwgd2FybmluZyA9IEZBTFNFfQp0ZXh0X3RhYmxlIDwtIGRhdGEuZnJhbWUoCiAgRXhwcmVzc2lvbiA9IGMoInwiLCAiLiIsICJbICBdIiwgIlsgLSBdIiwgIlteIF0iLCAiKCApIiksCiAgTWVhbmluZyA9IGMoIm9yIiwgIm1hdGNoZXMgYW55IHNpbmdsZSBjaGFyYWN0ZXIiLCAibWF0Y2hlcyBBTlkgb2YgdGhlIGNoYXJhY3RlcnMgaW5zaWRlIHRoZSBicmFja2V0cyIsICJtYXRjaGVzIGEgUkFOR0Ugb2YgY2hhcmFjdGVycyBpbnNpZGUgdGhlIGJyYWNrZXRzIiwgIm1hdGNoZXMgYW55IGNoYXJhY3RlciBFWENFUFQgdGhvc2UgaW5zaWRlIHRoZSBicmFja2V0IiwgImdyb3VwaW5nIC0gdXNlZCBmb3IgX2JhY2tyZWZlcmVuY2luZ18iKQopCgprYWJsZSh0ZXh0X3RhYmxlLCAiaHRtbCIpICU+JQogIGthYmxlX3N0eWxpbmcoZnVsbF93aWR0aCA9IEYpICU+JQogIGNvbHVtbl9zcGVjKDEsIGl0YWxpYyA9IFQsIGJvcmRlcl9yaWdodCA9IFQpICU+JQogIGNvbHVtbl9zcGVjKDIsIHdpZHRoID0gIjQwZW0iKQpgYGAKCl9Fc2NhcGUgY2hhcmFjdGVyc18KClNvbWV0aW1lcyBhIGNoYXJhY3RlciBpcyBqdXN0IGEgY2hhcmFjdGVyLi4uIGFsbG93cyB5b3UgdG8gdXNlIGEgY2hhcmFjdGVyICdhcyBpcycgcmF0aGVyIHRoYW4gaXRzIHNwZWNpYWwgZnVuY3Rpb24uIEluIFIsIHJlZ2V4IGdldHMgZXZhbHVhdGVkIGFzIGEgc3RyaW5nIGJlZm9yZSBhIHJlZ3VsYXIgZXhwcmVzc2lvbiwgYW5kIGEgYmFja3NsYXNoIGlzIHVzZWQgdG8gZXNjYXBlIHRoZXJlIGFzIHdlbGwgLSBzbyB5b3UgcmVhbGx5IG5lZWQgMiBiYWNrc2xhc2hlcyB0byBlc2NhcGUsIHNheSwgYSAkIHNpZ24gKGAiXFxcJCJgKS4gCgpgYGB7ciBlY2hvID0gRkFMU0UsIGV2YWwgPSBUUlVFLCB3YXJuaW5nID0gRkFMU0V9CnRleHRfdGFibGUgPC0gZGF0YS5mcmFtZSgKICBFeHByZXNzaW9uID0gYygiXFwiKSwKICBNZWFuaW5nID0gYygiZXNjYXBlISBuZWNlc3NhcnkgdG8gdXNlIHNwZWNpYWwgbWV0YS1jaGFyYWN0ZXJzICgqLCAkLCBeLCAuLCA/LCB8LCBcXCwgWywgXSwgeywgfSwgKCwgKSkgW05vdGUgdGhhdCB0aGUgYmFja3NsYXNoIGlzIGEgbWV0YS1jaGFyYWN0ZXIgYXMgd2VsbF0iKQopCgprYWJsZSh0ZXh0X3RhYmxlLCAiaHRtbCIpICU+JQogIGthYmxlX3N0eWxpbmcoZnVsbF93aWR0aCA9IEYpICU+JQogIGNvbHVtbl9zcGVjKDEsIGl0YWxpYyA9IFQsIGJvcmRlcl9yaWdodCA9IFQpICU+JQogIGNvbHVtbl9zcGVjKDIsIHdpZHRoID0gIjQwZW0iKQpgYGAKClRyb3VibGUtc2hvb3Rpbmcgd2l0aCBlc2NhcGluZyBtZXRhLWNoYXJhY3RlcnMgbWVhbnMgYWRkaW5nIGJhY2tzbGFzaGVzIHVudGlsIHNvbWV0aGluZyB3b3Jrcy4gKEpva2luZy9ub3Qgam9raW5nKQoKIVtKb2tpbmcvTm90IEpva2luZ10oaW1nL2JhY2tzbGFzaGVzLnBuZykKCjwvYnI+CgojI0RhdGEgQ2xlYW5pbmcgd2l0aCBCYXNlIFIgKEFLQSBXaGF0IGlzIEVsb24gTXVzayB1cCB0byBhbnl3YXlzPykKCkxldCdzIHRha2UgdGhpcyBjYWNhcGhvbnkgb2YgY2hhcmFjdGVycyB3ZSd2ZSBqdXN0IGxlYXJuZWQgYWJvdXQgYW5kIHBlcmZvcm0gc29tZSBiYXNpYyBkYXRhIGNsZWFuaW5nIHRhc2tzIHdpdGggYW4gYWN0dWFsIChmdW4/KSBtZXNzeSBkYXRhIHNldC4gSSBoYXZlIHNjcmFwZWQgRWxvbiBNdXNrJ3MgbGF0ZXN0IHR3ZWV0cyBmcm9tIFR3aXR0ZXIuIFRoZSBjb2RlIHRvIGRvIHRoaXMgaXMgaW4gdGhlIExlc3NvbiA0IG1hcmtkb3duIGZpbGUgaWYgeW91IGFyZSBjdXJpb3VzIGFuZC9vciB3YW50IHRvIGNyZWVwIHNvbWVvbmUgb24gVHdpdHRlci4KCmBgYHtyIGVjaG8gPSBGQUxTRSwgaW5jbHVkZSA9IEZBTFNFLCBldmFsID0gRkFMU0UgfQpsaWJyYXJ5KHR3aXR0ZVIpCmxpYnJhcnkoaHR0cikKI1RoaXMgaXMgZnJvbSB0aGUgY29kZSBkZW1vIGZvciBodHRyOjpvYXV0aDEtdHdpdHRlcgojIDEuIEZpbmQgT0F1dGggc2V0dGluZ3MgZm9yIHR3aXR0ZXI6Cm9hdXRoX2VuZHBvaW50cygidHdpdHRlciIpCgojIDIuIFJlZ2lzdGVyIGFuIGFwcGxpY2F0aW9uIGF0IGh0dHBzOi8vYXBwcy50d2l0dGVyLmNvbS8KIyAgICBNYWtlIHN1cmUgdG8gc2V0IGNhbGxiYWNrIHVybCB0byAiaHR0cDovLzEyNy4wLjAuMToxNDEwLyIKIwojICAgIFJlcGxhY2Uga2V5IGFuZCBzZWNyZXQgYmVsb3cKbXlhcHAgPC0gb2F1dGhfYXBwKCJ0d2l0dGVyIiwKICBrZXkgPSAiVFlyV0ZQa0ZBa240RzVCYmtXSU5ZdyIsCiAgc2VjcmV0ID0gInFqT2ttS1lVOWtXZlVGV21la0p1dTV0enRFOWFFZkxidDI2V2xoWkw4IgopCgojIDMuIEdldCBPQXV0aCBjcmVkZW50aWFscwp0d2l0dGVyX3Rva2VuIDwtIG9hdXRoMS4wX3Rva2VuKG9hdXRoX2VuZHBvaW50cygidHdpdHRlciIpLCBteWFwcCkKCiMgNC4gVXNlIEFQSSAtIG5vdyB0aGUgY29kZSBkaXZlcmdlcyBmcm9tIHRoZSBkZW1vCnNldHVwX3R3aXR0ZXJfb2F1dGgoY29uc3VtZXJfa2V5ID0gdHdpdHRlcl90b2tlbltbImFwcCJdXVtbImtleSJdXSAsIGNvbnN1bWVyX3NlY3JldCA9IHR3aXR0ZXJfdG9rZW5bWyJhcHAiXV1bWyJzZWNyZXQiXV0sIGFjY2Vzc190b2tlbiA9IHR3aXR0ZXJfdG9rZW5bWyJjcmVkZW50aWFscyJdXVtbIm9hdXRoX3Rva2VuIl1dLCBhY2Nlc3Nfc2VjcmV0ID0gdHdpdHRlcl90b2tlbltbImNyZWRlbnRpYWxzIl1dW1sib2F1dGhfdG9rZW5fc2VjcmV0Il1dICkKCiMzMjAwIGlzIHRoZSBtYXggbnVtYmVyIG9mIHR3ZWV0cyByZXF1ZXN0YWJsZQplbG9uX3R3ZWV0cyA8LSB1c2VyVGltZWxpbmUoImVsb25tdXNrIiwgbiA9IDMyMDApCmVsb25fdHdlZXRzX2RmIDwtIHRibF9kZihtYXBfZGYoZWxvbl90d2VldHMsIGFzLmRhdGEuZnJhbWUpKQoKd3JpdGUudGFibGUoZWxvbl90d2VldHNfZGYsICJkYXRhL2Vsb25fdHdlZXRzX2RmLnR4dCIsIHNlcCA9ICJcdCIpCgojdXNlZCB0aGUgZm9sbG93aW5nIGhhbmRsZXMgYyhlbG9uID0gImVsb25tdXNrIiwgbnllID0gIkJpbGxOeWUiLCBqdCA9ICJKdXN0aW5UcnVkZWF1IiwgY29sYmVydCA9ICJTdGVwaGVuQXRIb21lIiwgamVuZ2FyZHkgPSAiSmVubmlmZXJHYXJkeSIsIGplbm55ID0gIkplbm55QnJ5YW4iLCBrYXR5ID0gIkthdHlQZXJyeSIsIGRhaWx5ID0gIlRoZURhaWx5U2hvdyIsIGppbW15ID0gIkppbW15RmFsbG9uIiwgdHJ1bXAgPSAicmVhbERvbmFsZFRydW1wIikKYGBgCgpMZXQncyByZWFkIGluIHRoZSBzZXQgb2YgdHdlZXRzLCB0YWtlIGEgbG9vayBhdCB0aGUgc3RydWN0dXJlIG9mIHRoZSBkYXRhLCBhbmQgdXNlICd0aWR5dmVyc2UnIHRvIG9yZGVyIHRoZSBkYXRhIGJ5IHRoZSBtb3N0IHBvcHVsYXIgKGZhdm9yaXRlZCkgdHdlZXRzLiBMZXQncyBjaGVjayBvdXQgdGhlIHRvcCA1IGZhdm9yaXRlIHR3ZWV0cy4KCmBgYHtyfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKCmVsb25fdHdlZXRzX2RmIDwtIHJlYWQuZGVsaW0oImRhdGEvZWxvbl90d2VldHNfZGYudHh0Iiwgc2VwID0gIlx0Iiwgc3RyaW5nc0FzRmFjdG9ycyA9IEYpCnN0cihlbG9uX3R3ZWV0c19kZikKCmVsb25fdHdlZXRzX2RmIDwtIGVsb25fdHdlZXRzX2RmICU+JSBhcnJhbmdlKGRlc2MoZmF2b3JpdGVDb3VudCkpCgplbG9uX3R3ZWV0c19kZiR0ZXh0WzE6NV0KCgpgYGAKT3VyIGVuZCBnb2FsIGlzIGdvaW5nIHRvIGJlIHRvIGxvb2sgYXQgdGhlIHRvcCA1MCB3b3JkcyBpbiBFbG9uIE11c2sncyB0d2VldHMuIEkgZW1waGFzaXplIHdvcmRzLCBiZWNhdXNlIEkgZG9uJ3Qgd2FudCB1cmxzLCBvciBoYXN0YWdzLCBvciBvdGhlciB0YWdzLiBJIGFsc28gZG9uJ3Qgd2FudCBwdW5jdHVhdGlvbiBvciBzcGFjZXMuIEkgd2FudCB0byBleHRyYWN0IGp1c3QgdGhlIHdvcmRzIGZyb20gdHdlZXRzLiBGaXJzdCwgSSB3YW50IHRvIGdldCByZW1vdmUgdGhlIHRhZ3MgZnJvbSB0aGUgYmVnaW5uaW5nIG9mIHdvcmRzLiBJIGFtIGdvaW5nIHRvIHNhdmUgbXkgcmVnZXggZXhwcmVzc2lvbiBpbnRvIGFuIG9iamVjdCAtIHNvIHdlIGNhbiB1c2UgdGhlbSBhZ2FpbiBsYXRlci4KCldoYXQgdGhpcyBleHByZXNzaW9uIHNheXMgaXMgdGhhdCBJIHdhbnQgdG8gZmluZCBtYXRjaGVzIGZvciBhIGhhc3RhZyBPUiBhbiBhc3BlcmFuZCBmb2xsb3dlZCBieSBhdCBsZWFzdCBvbmUgd29yZCBjaGFyYWN0ZXIuIGBncmVwYCBpcyBhIGZ1bmN0aW9uIHRoYXQgYWxsb3dzIHVzIHRvIG1hdGNoIG91ciBwYXR0ZXJuIChvdXIgZXhwcmVzc2lvbikgdG8gYSBjaGFyYWN0ZXIgdmVjdG9yLgoKYGBge3J9CnRhZ3MgPC0gIiN8QFxcdysiCgpncmVwKHBhdHRlcm4gPSB0YWdzLCB4ID0gZWxvbl90d2VldHNfZGYkdGV4dCkKCmBgYApXZSBjYW4gc2VlIHRoYXQgYGdyZXBgIHJldHVybnMgdGhlIGluZGV4IG9mIHRoZSBtYXRjaC4gV2UgaGF2ZSBhIG51bWJlciBvZiBlbnRyaWVzIHRoYXQgaW5jbHVkZSB0YWdzLiBIb3dldmVyLCBpZiB3ZSB3YW50IHRvIHJldHVybiB0aGUgdHdlZXQgaXRzZWxmIGluc3RlYWQgb2YgdGhlIGluZGV4LCB3ZSBjYW4gdXNlIHRoZSBhcmd1bWVudCBgdmFsdWUgPSBUUlVFYC4gSXQgaXMgYSBnb29kIGlkZWEgdG8gZG8gYSB2aXN1YWwgaW5zcGVjdGlvbiBvZiB5b3VyIHJlc3VsdCB0byBtYWtlIHN1cmUgeW91ciBtYXRjaGVzIG9yIHN1YnN0aXR1dGlvbnMgYXJlIHdvcmtpbmcgdGhlIHdheSB5b3UgZXhwZWN0ZWQuIEluIHRoaXMgY2FzZSwgaXQgbG9va3MgbGlrZSBlYWNoIHR3ZWV0IGRvZXMgaGF2ZSBhIHRhZy4gV2UgY2FuIHRoZW4gdXNlIGBnc3ViYCB0byByZXBsYWNlIHRoYXQgcGF0dGVybiAob3VyIHRhZ3MpIHdpdGggbm90aGluZyAoYW4gZW1wdHkgc3RyaW5nKS4KCmBgYHtyfQoKZ3JlcCh0YWdzLCBlbG9uX3R3ZWV0c19kZiR0ZXh0LCB2YWx1ZSA9IFRSVUUpICU+JSBoZWFkKCkKCmVsb25fdHdlZXRzX2RmJHRleHQgPC0gZ3N1YihwYXR0ZXJuID0gdGFncywgcmVwbGFjZW1lbnQgPSAiIiwgZWxvbl90d2VldHNfZGYkdGV4dCkKCmBgYApJdCBhbHNvIGxvb2tzIGxpa2UgYW55dGhpbmcgdGhhdCB3YXMgYW4gYXBvc3Ryb3BoZSBoYXMgYmVlbiByZXBsYWNlZCB3aXRoIGEgc3RyaW5nIG9mIG51bWJlcnMgYW5kIGJhY2tzbGFzaGVzLiBUaGlzIGlzIGR1ZSB0byB0aGUgZmFjdCB0aGF0IHR3ZWV0cyBhcmUgX2VuY29kZWRfIGluIFVURi0xNiBhbmQgY29udmVydGVkIHRvIFVURi04LiBPdGhlciB0aGluZ3MgdGhhdCBoYXZlIGNoYXJhY3RlciBjb2RlcywgbGlrZSBlbW9qaXMsIHdpbGwgYWxzbyBiZSBlbmNvZGVkIGRpZmZlcmVudGx5LiBIZXJlIGlzIGFuIGV4YW1wbGUgb2YgZW1vamkgZW5jb2Rpbmc6IGh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS90b2RheS1pcy1hLWdvb2QtZGF5L0Vtb3RpY29ucy9tYXN0ZXIvZW1EaWN0LmNzdi4gU2luY2Ugd2UgYXJlIGdvaW5nIHRvIHJlbW92ZSBwdW5jdHVhdGlvbiwgd2UgYXJlIGdvaW5nIHRvICdjb252ZXJ0JyB0aGUgZW5jb2RpbmcgYnkgc3Vic3RpdHV0aW5nIGl0IHdpdGggbm90aGluZyAoYWdhaW4sIGFuIGVtcHR5IGNoYXJhY3RlciBzdHJpbmcpLiBUaGlzIGlzIG5vdCBzb21ldGhpbmcgeW91IHdpbGwgaGF2ZSB0byBkZWFsIHdpdGggb24gYSBkYWlseSBiYXNpcywgYnV0IGNoYXJhY3RlciBlbmNvZGluZyBpcyBzb21ldGhpbmcgdG8gYmUgYXdhcmUgb2YsIGVzcGVjaWFsbHkgd2hlbiBzY3JhcGluZyBkYXRhIGZyb20gdGhlIHdlYi4gIAoKYGBge3J9Cmljb252KGVsb25fdHdlZXRzX2RmJHRleHQsICJVVEYtOCIsICJBU0NJSSIsIHN1Yj0iIikgJT4lIGhlYWQoKQoKZWxvbl90d2VldHNfZGYkdGV4dCA8LSBpY29udihlbG9uX3R3ZWV0c19kZiR0ZXh0LCAiVVRGLTgiLCAiQVNDSUkiLCBzdWIgPSAiIikKYGBgCgpPdXIgbmV4dCBzdGVwIHdvdWxkIGJlIHRvIHJlbW92ZSB1cmxzLiBUaGlzIGlzIGEgYml0IHRyaWNreS4gV2UgY291bGQgYmUgbG9va2luZyBmb3IgaHR0cDovLyBvciBodHRwczovLyBmb2xsb3dlZCBieSB3ZSBkb24ndCBrbm93IHdoYXQgKHNvbWUgY29tYmluYXRpb24gb2YgbGV0dGVycywgbnVtYmVycyBhbmQgZm9yd2FyZCBzbGFzaGVzKS4gCgpXZSBjYW4gY2hlY2sgb3V0IHdoaWNoIHR3ZWV0cyBoYXZlIHVybHMgdXNpbmcgYGdyZXBgIGFzIHdlIGRpZCBwcmV2aW91c2x5LiBXZSBjYW4gYWxzbyB1c2UgYGdyZXBsYCB0byBnZXQgYSBsb2dpY2FsIHJlcG9uc2UgZm9yIGlmIGEgdHdlZXQgaGFzIGEgdXJsIG9yIG5vdC4gVGhhdCB3YXksIGlmIHlvdSB3YW50ZWQgdG8gZ3JhYiBhbGwgb2YgdGhlIHVybHMgdGhhdCBFbG9uIE11c2sgc3VnZ2VzdHMgdG8gdmlzaXQsIHlvdSBjYW4gZmlsdGVyIHdpdGggYGdyZXBsYCwgdG8gc2VsZWN0IGFsbCBvZiB0aGUgdHdlZXRzIHdoZXJlIGl0IGlzIFRSVUUgdGhhdCBhIHVybCBpcyBwcmVzZW50LgoKSG93ZXZlciwgd2UgYXJlIGdvaW5nIHRvIGNvbnRpbnVlIG91ciBwYXR0ZXJuIG9mIHN1YnN0aXR1dGluZyB3aGF0IHdlIGRvbid0IHdhbnQgd2l0aCBhbiBlbXB0eSBjaGFyYWN0ZXIgc3RyaW5nLgoKYGBge3J9CnVybCA8LSAiaHR0cFtzXT86Ly9bWzphbG51bTpdLlxcL10rIgoKCmdyZXAodXJsLCBlbG9uX3R3ZWV0c19kZiR0ZXh0LCB2YWx1ZSA9IFRSVUUpICU+JSBoZWFkKCkKZ3JlcGwodXJsLCBlbG9uX3R3ZWV0c19kZiR0ZXh0KSAlPiUgaGVhZCgpCgplbG9uX3VybHMgPC0gZWxvbl90d2VldHNfZGYgJT4lIGZpbHRlcihncmVwbCh1cmwsIGVsb25fdHdlZXRzX2RmJHRleHQpKQoKZWxvbl90d2VldHNfZGYkdGV4dCA8LSBnc3ViKHBhdHRlcm4gPSAiaHR0cFtzXT86Ly9bWzphbG51bTpdLlxcL10rIiwgcmVwbGFjZW1lbnQgPSAiIiwgZWxvbl90d2VldHNfZGYkdGV4dCkKCmBgYAoKTGFzdGx5LCB3ZSBhcmUgZ29pbmcgdG8gZ2V0IHJpZCBvZiB0cmFpbGluZyBzcGFjZXMsIG51bWJlcnMsIGFuZCBwdW5jdHVhdGlvbiBhbGwgYXQgdGhlIHNhbWUgdGltZS4gWW91IGNhbiBmaW5kIHRyYWlsaW5nIGF0IHRoZSB2ZXJ5IGVuZCBvZiBvdXIgdHdlZXQgc3RyaW5nLgoKYGBge3J9CnRyYWlsIDwtICJbIF0rJHxbMC05XSp8W1s6cHVuY3Q6XV0iCgpncmVwKHRyYWlsLCBlbG9uX3R3ZWV0c19kZiR0ZXh0LCB2YWx1ZSA9IFRSVUUpICU+JSBoZWFkKCkKCmVsb25fdHdlZXRzX2RmJHRleHQgPC0gZ3N1YihwYXR0ZXJuID0gdHJhaWwsIHJlcGxhY2VtZW50ID0gIiIsIGVsb25fdHdlZXRzX2RmJHRleHQpCgplbG9uX3R3ZWV0c19kZiR0ZXh0WzE6NV0KYGBgCkl0IGxvb2tzIGxpa2UgZXZlcnl0aGluZyB3b3JrZWQgZXhjZXB0IHRoYXQgYSBzcGFjaW5nIGlzc3VlIHdhcyBjcmVhdGVkIGJ5IHJlbW92aW5nIGFsbCBvZiB0aGUgbnVtYmVycy4gTGV0J3MgdGFrZSBhbGwgb2YgdGhlIHBsYWNlcyB3aGVyZSB0aGVyZSBhcmUgMiBvciBtb3JlIHNwYWNlcyBjcmVhdGVkIGFuZCBzdWJzdGl0dXRlIHRoZW0gd2l0aCBqdXN0IG9uZSBzcGFjZS4gCgpgYGB7cn0Kc3BhY2UgPC0gIlxcc3syLH0iCgpncmVwKHNwYWNlLCBlbG9uX3R3ZWV0c19kZiR0ZXh0LCB2YWx1ZSA9IFRSVUUpICU+JSBoZWFkKCkKCmVsb25fdHdlZXRzX2RmJHRleHQgPC0gZ3N1YihwYXR0ZXJuID0gc3BhY2UsIHJlcGxhY2VtZW50ID0gIiAiLCBlbG9uX3R3ZWV0c19kZiR0ZXh0KQoKZWxvbl90d2VldHNfZGYkdGV4dFsxOjVdCmBgYAoqKioKX19DaGFsbGVuZ2VfXyAKCgo8ZGl2IHN0eWxlPSJmbG9hdDpsZWZ0O21hcmdpbjowIDEwcHggMTBweCAwIiBtYXJrZG93bj0iMSI+CiFbXShpbWcvbWF4cmVzZGVmYXVsdC5qcGcpe3dpZHRoPTE1MHB4fQoKPC9kaXY+CgpXZSBhbHNvIGhhdmUgYSBsZWFkaW5nIHdoaXRlc3BhY2Ugd2hlcmUgd2UgcmVtb3ZlZCBhIG51bWJlci4gSG93IHdvdWxkIHdlIHJlbW92ZSB0aGF0IHdoaXRlc3BhY2U/CgoKPC9icj4KPC9icj4KPC9icj4KCioqKgoKYGBge3IgaW5jbHVkZSA9IEZBTFNFfQpleHRyYSA8LSAiXlsgXSIKCmdyZXAoZXh0cmEsIGVsb25fdHdlZXRzX2RmJHRleHQsIHZhbHVlID0gVFJVRSkgJT4lIGhlYWQoKQoKZWxvbl90d2VldHNfZGYkdGV4dCA8LSBnc3ViKHBhdHRlcm4gPSBleHRyYSwgcmVwbGFjZW1lbnQgPSAiIiwgZWxvbl90d2VldHNfZGYkdGV4dCkKCmVsb25fdHdlZXRzX2RmJHRleHRbMTo1XQpgYGAKCk9ud2FyZHMhISBMZXQncyBicmVhayB0aGUgdGV4dHMgZG93biBpbnRvIGluZGl2aWR1YWwgd29yZHMsIHNvIHdlIGNhbiBzZWUgd2hhdCB0aGUgbW9zdCBjb21tb24gd29yZHMgdXNlZCBhcmUuIFdlIGNhbiB1c2UgdGhlIGJhc2UgUiBmdW5jdGlvbiBgc3Ryc3BsaXRgIHRvIGRvIHRoaXMsIGluIHRoaXMgY2FzZSB3ZSB3YW50IHRvIHNwbGl0IG91ciB0d2VldHMgaW50byB3b3JkcyBieSBzcGxpdHRpbmcgb24gc3BhY2VzLiAKCgpgYGB7cn0Kc3Ryc3BsaXQoZWxvbl90d2VldHNfZGYkdGV4dCwgc3BsaXQgPSAiICIpCgpgYGAKTm90ZSB0aGF0IHRoZSBvdXRwdXQgb2YgdGhpcyBmdW5jdGlvbiBpcyBzb21lIGhvcnJpYmxlIGxpc3Qgb2JqZWN0LiAKCmBgYHtyfQpzdHIoc3Ryc3BsaXQoZWxvbl90d2VldHNfZGYkdGV4dCwgc3BsaXQgPSAiICIpKQpgYGAKCkx1Y2tpbHkgdGhlcmUgaXMgYW4gYHVubGlzdGAgZnVuY3Rpb24gd2hpY2ggcmVjdXJzaXZlbHkgd2lsbCBnbyB0aHJvdWdoIGxpc3RzIHRvIHNpbXBsaWZ5IHRoZWlyIGVsZW1lbnRzIGludG8gYSB2ZWN0b3IuIExldCdzIHRyeSBpdCBhbmQgY2hlY2sgdGhlIHN0cnVjdHVyZSBvZiBvdXIgb3V0cHV0LiBXZSB3aWxsIHNhdmUgdGhpcyB0byBhbiBvYmplY3QgY2FsbGVkICd3b3JkcycuCgpgYGB7cn0KdW5saXN0KHN0cnNwbGl0KGVsb25fdHdlZXRzX2RmJHRleHQsIHNwbGl0ID0gIiAiKSkKCndvcmRzIDwtIHVubGlzdChzdHJzcGxpdChlbG9uX3R3ZWV0c19kZiR0ZXh0LCBzcGxpdCA9ICIgIikpCgpzdHIodW5saXN0KHN0cnNwbGl0KGVsb25fdHdlZXRzX2RmJHRleHQsIHNwbGl0ID0gIiAiKSkpCgp0YWlsKHdvcmRzKQpgYGAKR3JlYXQhIEJ1dC4uLiBJIG5vdGljZWQgdGhhdCB3ZSBtaXNzZWQgc29tZSBgXG5gIChuZXdsaW5lKSBhbmQgYFx0YCAodGFiKSBjaGFyYWN0ZXJzLgoKKioqCl9fQ2hhbGxlbmdlX18gCgoKPGRpdiBzdHlsZT0iZmxvYXQ6bGVmdDttYXJnaW46MCAxMHB4IDEwcHggMCIgbWFya2Rvd249IjEiPgohW10oaW1nL21heHJlc2RlZmF1bHQuanBnKXt3aWR0aD0xNTBweH0KCjwvZGl2PgoKTmV3bGluZSBhbmQgdGFiIGNoYXJhY3RlcnMgYXJlIHNlcGFyYXRpbmcgMiB3b3Jkcy4gU3BsaXQgdGhlc2Ugd29yZHMgYXBhcnQgYW5kIGdldCByaWQgb2YgdGhlIG5ld2xpbmUgY2hhcmFjdGVyLiBDb252ZXJ0IGFsbCBvZiBvdXIgY2hhcmFjdGVyIHN0cmluZ3MgdG8gbG93ZXJjYXNlIChJIGhhdmVuJ3Qgc2hvd24geW91IGhvdyB0byBkbyB0aGlzLCBidXQgSSBiZWxpZXZlIGluIHlvdXIgZ29vZ2xlLWZ1KS4gQ2hlY2sgdGhlIGZpcnN0IGFuZCBsYXN0IDUwIHdvcmRzIHRvIHNlZSBpZiBhbnl0aGluZyBlbHNlIGlzIGFtaXNzLgoKCjwvYnI+CjwvYnI+CjwvYnI+CgoqKioKCmBgYHtyIGluY2x1ZGUgPSBGQUxTRX0Kd29yZHMgPC0gdG9sb3dlcih1bmxpc3Qoc3Ryc3BsaXQod29yZHMsICJcXG58XFx0IikpKQoKd29yZHMgPC0gY2FzZWZvbGQodW5saXN0KHN0cnNwbGl0KHdvcmRzLCAiXFxufFxcdCIpKSwgdXBwZXIgPSBGQUxTRSkKCndvcmRzWzE6NTBdCnRhaWwod29yZHMsIDUwKQpgYGAKClRoZXJlIGFyZSBzdGlsbCBhIGZldyBwcm9ibGVtcyB3aXRoIHdvcmRzIGN1dG9mZiBsaWtlICdzb2x2Jywgb3IgJ2ZsYW1ldGhyb3dlcicgYW5kICdmbGFtZXRocm93ZXJzJyBiZWluZyB0aGUgc2FtZSB3b3JkLCBvciAnbm9ydGgnIGFuZCAna29yZWEnIGJlbG9uZ2luZyB0b2dldGhlciBmb3IgY29udGV4dC4gSWYgd2Ugd2VyZSBzZXJpb3VzIGFib3V0IHRoaXMgZGF0YXNldCB3ZSB3b3VsZCBuZWVkIHRvIHJlc29sdmUgdGhlc2UgaXNzdWVzLiBXZSBhbHNvIGhhdmUgc29tZSBodG1sIGFuZCB0d2l0dGVyLXNwZWNpZmljIHRhZ3MgdGhhdCB3ZSB3aWxsIGRlYWwgd2l0aCBzaG9ydGx5LiBIb3dldmVyLCB3ZSBhcmUgZ29pbmcgdG8gbW92ZSBhaGVhZCBhbmQgY291bnQgdGhlIG51bWJlciBvZiBvY2N1cmVuY2VzIG9mIGVhY2ggd29yZCBhbmQgb3JkZXIgdGhlbSBieSBmcmVxdWVuY3kuCgpgYGB7cn0KZGF0YS5mcmFtZSh3b3JkcykgJT4lIGNvdW50KGZhY3Rvcih3b3JkcykpICU+JSBhcnJhbmdlKGRlc2MobikpCmBgYAoKCgoKV293LiBXZSBoYXZlIGRpc2NvdmVyZWQgcGVvcGxlIHVzZSBwcmVwb3NpdGlvbnMgYW5kIGNvbmp1bmN0aW9ucywgYW5kIHdvcmRzIHVucmVsYXRlZCB0byBjb250ZW50IGJ1dCBodG1sIGphcmdvbiwgb3IgdGhpbmdzIGxpa2UgJ25hJyBhbmQgJ2ZhbHNlJy4gVGhlcmUgaXMgYSBsaXN0IG9mICdzdG9wIHdvcmRzJyB0aGF0IGNhbiBiZSB1c2VkIHRvIGdldCByaWQgb2Ygd29yZHMgdGhhdCBhcmUgdW5saWtlbHkgdG8gY29udGFpbiBpbmZvcm1hdGlvbiBmb3IgdXMgYXMgcGFydCBvZiB0aGUgYHRpZHl0ZXh0YCBwYWNrYWdlLiBIb3dldmVyLCB3ZSB3aWxsIGhhdmUgdG8gYWRkIHRvIHRoaXMgbGlzdC4KClRoZSBwcmVtYWRlIGRhdGFmcmFtZSBpcyBjYWxsZWQgYHN0b3Bfd29yZHNgLiBXZSBjYW4gc2F2ZSBpdCBhcyBhbiBvYmplY3QgYW5kIGFkZCB0byBpdCBieSBtYWtpbmcgYSBkYXRhZnJhbWUgb2YgdGhlIHdvcmRzIHdlIHdhbnQgdG8gYWRkLiBXZSBjYW4gY2FsbCBvdXIgbGV4aWNvbiAnY3VzdG9tJy4KCmBgYHtyfQpsaWJyYXJ5KHRpZHl0ZXh0KQpzdG9wIDwtIHN0b3Bfd29yZHMKc3RyKHN0b3ApCgphZGRfc3RvcCA8LSBkYXRhLmZyYW1lKHdvcmQgPSBjKCJuYSIsICJmYWxzZSIsICJocmVmIiwgInJlbCIsICJub2ZvbGxvdyIsICJ0cnVlIiwgImFtcCIsICJ0d2l0dGVyIiwgImlwaG9uZWEiLCAicmVsbm9mb2xsb3d0d2l0dGVyIiwgInJlbG5vZm9sbG93aW5zdGFncmFtYSIpLCAKICAgICAgICAgICAgICAgICAgICAgICBsZXhpY29uID0gImN1c3RvbSIsIHN0cmluZ3NBc0ZhY3RvcnMgPSBGQUxTRSkKI0kganVzdCBzaG91bGQgcG9pbnQgb3V0IHRoYXQgdHJ1ZSBvciBmYWxzZSAnY291bGQnIGJlIGEgd29yZCByYXRoZXIgdGhhbiBhIGxvZ2ljYWwsIEkgY291bGQgZmlsdGVyIG91dCBGQUxTRSBhbmQgVFJVRSBiZWZvcmUgY2hhbmdpbmcgZXZlcnl0aGluZyB0byBsb3dlcmNhc2UgYW5kIHRoaXMgd291bGQgcmVkdWNlIHRoZSBsaWtlbHlob29kIG9mIG1pc3NpbmcgJ3dvcmRzJwojYWxzbyBub3RlIHRoYXQgJ2N1c3RvbScgcmVjeWNsZXMgYXMgYSBjaGFyYWN0ZXIgdmVjdG9yIG9mIGxlbmd0aCAxCgpzdG9wIDwtIGJpbmRfcm93cyhzdG9wLCBhZGRfc3RvcCkKYGBgCgpUbyByZW1vdmUgdGhlc2Ugc3RvcCB3b3JkcyBmcm9tIG91ciBsaXN0LCB3ZSBwZXJmb3JtIGFuIGFudGktam9pbiAoZnJvbSBMZXNzb24gMykuCgpgYGB7cn0Kd29yZHMgPC0gYW50aV9qb2luKGRhdGEuZnJhbWUod29yZHMpLCBzdG9wLCBieT1jKCJ3b3JkcyIgPSAid29yZCIpKQoKYGBgCmBgYHtyfQp3b3JkcyAlPiUgY291bnQod29yZHMpICU+JSBhcnJhbmdlKGRlc2MobikpCgp3b3JkcyA8LSB3b3JkcyAlPiUgY291bnQod29yZHMpICU+JSBhcnJhbmdlKGRlc2MobikpCmBgYAoKJ2JvcmluZycsICdmYWxjb24nLCAndGVzbGEnLCAncm9ja2V0JywgJ2xhdW5jaCcsJ2ZsYW1ldGhyb3dlcicsICdjYXJzJywgJ3NwYWNleCcsICd0dW5uZWxzJywgYW5kICdtYXJzJyBhbmQgJ2FpJyBhcmUgYSBiaXQgZnVydGhlciBkb3duLiBUaGVyZSBhcmUgYSBmZXcgd29yZHMgdGhhdCBsb29rIGxpa2UgdGhleSBzaG91bGQgYmUgYWRkZWQgdG8gdGhlICdzdG9wIHdvcmRzJyBsaXN0IChkb250LCBkb2VzbnQsIGRpZG50LCBpbSksIGJ1dCB3ZSdsbCB3b3JrIHdpdGggdGhpcyBmb3Igbm93LgoKV2UgY2FuIG1ha2UgYSB3b3JkIGNsb3VkIG91dCBvZiB0aGUgdG9wIDUwIHdvcmRzLCB3aGljaCB3aWxsIGJlIHNpemVkIGFjY29yZGluZyB0byB0aGVpciBmcmVxdWVuY3kuIEkgYW0gc3RhcnRpbmcgd2l0aCB0aGUgZmlyc3Qgd29yZCBhZnRlciBFbG9uIE11c2sncyB0d2l0dGVyIGhhbmRsZS4gVGhlIGRlZmF1bHQgY29sb3IgaXMgYmxhY2ssIGJ1dCB3ZSBjYW4gdXNlIG91ciB2aXJpZGlzIHBhY2thZ2UgKExlc3NvbiAzKSB0byBoYXZlIGEgcGxlYXNpbmcgY29sb3IgcGFsZXR0ZS4KCmBgYHtyfQpsaWJyYXJ5KCJ3b3JkY2xvdWQiKQpsaWJyYXJ5KCJ2aXJpZGlzIikKCndvcmRzWzI6NTEsXSAlPiUKICAgIHdpdGgod29yZGNsb3VkKHdvcmRzLCBuLCBvcmRlcmVkLmNvbG9ycyA9IFRSVUUsIGNvbG9ycyA9IHZpcmlkaXMoNTApLCB1c2Uuci5sYXlvdXQgPSBUUlVFKSkKYGBgCgoKCiMjRGF0YSBDbGVhbmluZyB3aXRoIHN0cmluZ3Ivc3RyaW5naSAoQUtBIFdoYXQgaXMgVHJ1bXAgdXAgdG8gYW55d2F5cz8pCgpXZSBhcmUgZ29pbmcgdG8gZG8gdGhlIGV4YWN0IHNhbWUgZGF0YSBjbGVhbmluZyB3aXRoIHRoZSBgc3RyaW5ncmAgcGFja2FnZSB1c2luZyBUcnVtcCdzIHR3ZWV0cy4gVGhlIHN5bnRheCBpcyBhIGxpdHRsZSBkaWZmZXJlbnQsIGJ1dCBpdCBpcyBwcmV0dHkgaW50dWl0aXZlIG9uY2UgeW91IGdldCBzdGFydGVkLiBBbGwgYHN0cmluZ3JgIGZ1bmN0aW9ucyBjYW4gYmUgZm91bmQgdXNpbmcgYHN0cl9gICsgYFRhYmAuIEFnYWluLCB3ZSB3aWxsIHN0YXJ0IGJ5IGxvYWRpbmcgdGhlIGRhdGFzZXQgYW5kIGxvb2tpbmcgYXQgdGhlIHRvcCA1IGZhdm9yaXRlIHR3ZWV0cy4KCmBgYHtyfQp0cnVtcF90d2VldHNfZGYgPC0gcmVhZC5kZWxpbSgiZGF0YS90cnVtcF90d2VldHNfZGYudHh0Iiwgc2VwID0gIlx0IikKCgp0cnVtcF90d2VldHNfZGYgPC0gdHJ1bXBfdHdlZXRzX2RmICU+JSBhcnJhbmdlKGRlc2MoZmF2b3JpdGVDb3VudCkpCnRydW1wX3R3ZWV0c19kZiR0ZXh0WzE6NV0KCmljb252KHRydW1wX3R3ZWV0c19kZiR0ZXh0LCAiVVRGLTgiLCAiQVNDSUkiLCBzdWI9IiIpICU+JSBoZWFkKCkKCnRydW1wX3R3ZWV0c19kZiR0ZXh0IDwtIGljb252KHRydW1wX3R3ZWV0c19kZiR0ZXh0LCAiVVRGLTgiLCAiQVNDSUkiLCBzdWIgPSAiIikKYGBgCgpUaGUgZmlyc3QgdGhpbmcgdGhhdCB3ZSBkaWQgd2FzIGxvb2sgZm9yIHRhZ3MuIFRoZSBhcmd1bWVudHMgYXJlIHN3aXRjaGVkIGluIGBzdHJpbmdyYCByZWxhdGl2ZSB0byB0aGUgYmFzZSBmdW5jdGlvbnMuIFRoZSBmaXJzdCBhcmd1bWVudCB3aWxsIGJlIHRoZSBjaGFyYWN0ZXIgc3RyaW5nIHdlIGFyZSBzZWFyY2hpbmcsIGFuZCB0aGUgc2Vjb25kIGFyZ3VtZW50IHdpbGwgYmUgdGhlIHBhdHRlcm4gd2UgYXJlIG1hdGNoaW5nLiBgc3RyX2V4dHJhY3RgIHdpbGwgcmV0dXJuIHRoZSBpbmRleCBvZiB0aGUgbWF0Y2gsIGFzIHdlbGwgYXMgdGhlIG1hdGNoLiBUaGlzIGlzIHNpbWlsYXIgdG8gYGdyZXBgIHdoZW4gYHZhbHVlID0gVFJVRWAuCgpgYGB7cn0Kc3RyX2V4dHJhY3Qoc3RyaW5nID0gdHJ1bXBfdHdlZXRzX2RmJHRleHQsIHBhdHRlcm4gPSB0YWdzKQoKYGBgCgpgc3RyX2RldGVjdGAgaXMgc2ltaWxhciB0byBgZ3JlcGxgIHJldHVybmluZyBUUlVFIG9yIEZBTFNFIGlmIGEgbWF0Y2ggaXMgb3IgaXNuJ3QgZm91bmQsIHJlc3BlY3RpdmVseS4KCmBgYHtyfQpzdHJfZGV0ZWN0KHRydW1wX3R3ZWV0c19kZiR0ZXh0LCB0YWdzKQpgYGAKCkxldCdzIGJlIGFtYml0aW91cyBhbmQgdHJ5IHRvIHJlbW92ZSB0YWdzLCB1cmxzLCBuZXdsaW5lIGFuZCB0YWIgY2hhcmFjdGVycyBhbmQgbnVtYmVycyBhbGwgaW4gb25lIGdvLiBgc3RyX3JlbW92ZWAgYXV0b21hdGljYWxseSByZXBsYWNlcyB0aGUgbWF0Y2ggd2l0aCBhbiBlbXB0eSBjaGFyYWN0ZXIgc3RyaW5nLgoKYGBge3J9CmNsZWFuIDwtICJodHRwW3NdPzovL1tbOmFsbnVtOl0uXFwvXSsiCgojY2xlYW4yIDwtICIjfEBbYS16QS1aXSsiCgpjbGVhbjIgPC0gIiN8QFxcdysiCgpjbGVhbjMgPC0gIlxcJD9bMC05XSolPyIKCmNsZWFuNCA8LSAiXFwuPzo/LD8tPyE/XFwoP1xcKT9cXD8/IgoKY2xlYW41IDwtICJbWzpwdW5jdDpdXSIKCnN0cl9yZXBsYWNlX2FsbCh0cnVtcF90d2VldHNfZGYkdGV4dCwgY2xlYW41KQoKdHJ1bXBfdHdlZXRzX2RmJHRleHQgPC0gc3RyX3JlbW92ZV9hbGwodHJ1bXBfdHdlZXRzX2RmJHRleHQsIHBhdHRlcm4gPSBjbGVhbikKdHJ1bXBfdHdlZXRzX2RmJHRleHQgPC0gc3RyX3JlbW92ZV9hbGwodHJ1bXBfdHdlZXRzX2RmJHRleHQsIHBhdHRlcm4gPSBjbGVhbjIpCnRydW1wX3R3ZWV0c19kZiR0ZXh0IDwtIHN0cl9yZW1vdmVfYWxsKHRydW1wX3R3ZWV0c19kZiR0ZXh0LCBwYXR0ZXJuID0gY2xlYW4zKQp0cnVtcF90d2VldHNfZGYkdGV4dCA8LSBzdHJfcmVtb3ZlX2FsbCh0cnVtcF90d2VldHNfZGYkdGV4dCwgcGF0dGVybiA9IGNsZWFuNCkKdHJ1bXBfdHdlZXRzX2RmJHRleHQgPC0gc3RyX3JlbW92ZV9hbGwodHJ1bXBfdHdlZXRzX2RmJHRleHQsIHBhdHRlcm4gPSBjbGVhbjQpCgoKdHJ1bXBfdHdlZXRzX2RmJHRleHRbMToxMF0KCnN0cl9leHRyYWN0X2FsbCh0cnVtcF90d2VldHNfZGYkdGV4dFsxOjIwXSwgcGF0dGVybiA9IGNsZWFuMywgc2ltcGxpZnkgPSBUUlVFKQoKCgpgYGAKYHN0cmluZ3JgIGhhcyBpdHMgb3duIGZ1bmN0aW9uIGZvciB0cmltbWluZyB3aGl0ZXNwYWNlLCBgc3RyX3RyaW1gLCB3aGljaCB5b3UgY2FuIHVzZSB0byBzcGVjaWZ5IHdoZXRoZXIgeW91IHdhbnQgbGVhZGluZyBvciB0cmFpbGluZyB3aGl0ZXNwYWNlIHRyaW1tZWQsIG9yIGJvdGguCgpgYGB7cn0KdHJ1bXBfdHdlZXRzX2RmJHRleHQgPC0gc3RyX3RyaW0odHJ1bXBfdHdlZXRzX2RmJHRleHQsIHNpZGUgPSAiYm90aCIpCgp0cnVtcF90d2VldHNfZGYkdGV4dFsxOjEwXQpgYGAKClNlZSBob3cgd2UgaGF2ZSBhIGNvdXBsZSBleHRyYSBzcGFjZXMgaW4gdGhlIG1pZGRsZSBvZiBzb21lIG9mIG91ciBzdHJpbmdzPyBgc3RyX3NxdWlzaGAgd2lsbCB0YWtlIGNhcmUgb2YgdGhhdCBmb3IgdXMsIGxlYXZpbmcgb25seSBhIHNpbmdsZSBzcGFjZSBiZXR3ZWVuIHdvcmRzLgoKYGBge3J9CnRydW1wX3R3ZWV0c19kZiR0ZXh0IDwtIHN0cl9zcXVpc2godHJ1bXBfdHdlZXRzX2RmJHRleHQpCgp0cnVtcF90d2VldHNfZGYkdGV4dFsxOjEwXQpgYGAKCkFsbCB0aGF0J3MgbGVmdCBpcyB0byBjb252ZXJ0IGFsbCBjaGFyYWN0ZXJzIHRvIGxvd2VyY2FzZSwgYW5kIHRoZW4gd2UgY2FuIHNlZSB0aGUgdG9wIFRydW1wIHdvcmRzIQoKYGBge3J9Cgp0cnVtcF90d2VldHNfZGYkdGV4dCA8LSB0b2xvd2VyKHRydW1wX3R3ZWV0c19kZiR0ZXh0KQoKdHJ1bXBfdHdlZXRzX2RmJHRleHRbMToxMF0KYGBgCgpUbyBnZXQgb3VyIHR3ZWV0cyBpbnRvIGEgd29yZCBsaXN0IHdlIHVzZSB0aGUgc2ltaWxhciBmdW5jdGlvbiB0byBgc3Ryc3BsaXRgLCBgc3RyX3NwbGl0YCwgc3RpbGwgc3BsaXR0aW5nIGJ5IHRoZSBzcGFjZXMgYmV0d2Vlbm4gd29yZHMuIFRoZSBhcmd1bWVudCBgc2ltcGxpZnkgPSBGQUxTRWAgcmV0dXJucyBhIGxpc3Qgb2YgY2hhcmFjdGVyIHZlY3RvcnMgd2hpY2ggd2UgdGhlbiB1bmxpc3QuCgoKYGBge3J9CndvcmRzIDwtIHVubGlzdChzdHJfc3BsaXQodHJ1bXBfdHdlZXRzX2RmJHRleHQsIHBhdHRlcm4gPSAiICIsIHNpbXBsaWZ5ID0gRkFMU0UpKQpgYGAKCldlIGNhbiBub3cgZG8gb3VyIGFudGlfam9pbiB0byByZW1vdmUgJ3N0b3Agd29yZHMnLCBhbmQgdGFsbHkgb3VyIHJlbWFpbmluZyB3b3JkcyBhbmQgb3JkZXIgdGhlbSBhcyBiZWZvcmUuCgoKYGBge3J9CndvcmRzIDwtIGFudGlfam9pbihkYXRhLmZyYW1lKHdvcmRzKSwgc3RvcCwgYnk9Yygid29yZHMiID0gIndvcmQiKSkKCmBgYAoKYGBge3J9CndvcmRzICU+JSBjb3VudCh3b3JkcykgJT4lIGFycmFuZ2UoZGVzYyhuKSkKCndvcmRzIDwtIHdvcmRzICU+JSBjb3VudCh3b3JkcykgJT4lIGFycmFuZ2UoZGVzYyhuKSkKYGBgCgpIbW1tLi4uIGl0IGxvb2tzIGxpa2Ugd2UgaGF2ZSB0aG9zZSBodG1sIHRhZ3MgaW4gYSBkaWZmZXJlbnQgZm9ybWF0LiBJdCdzIGludGVyZXN0aW5nIHRvIG5vdGUgdGhlc2UgbGl0dGxlIHZhcmlhdGlvbnMgYmVjYXVzZSBubyBtYXR0ZXIgaG93IG11Y2ggeW91IHRyeSB0byBhdXRvbWF0ZSB5b3VyIGFuYWx5c2lzIHRoZXJlIGlzIGFsd2F5cyBnb2luZyB0byBiZSBzb21ldGhpbmcgZnJvbSB5b3VyIG5ldyBkYXRhc2V0IHRoYXQgZGlkbid0IGZpdCB3aXRoIHlvdXIgb2xkIGRhdGFzZXQuIFRoaXMgaXMgd2h5IHdlIG5lZWQgdGhlc2UgZGF0YSB3cmFuZ2xpbmcgc2tpbGxzLiBFdmVuIHRob3VnaCBzb21lIHBhY2thZ2VzIG1heSBoYXZlIGJlZW4gY3JlYXRlZCB0byBoZWxwIHVzIG9uIG91ciB3YXksIHRoZXkgY2FuJ3QgcG9zc2libHkgY292ZXIgZXZlcnkgY2FzZS4gQW5kIHRoZXkgYWxsIHdvcmsgc2xpZ2hseSBkaWZmZXJlbnRseS4KCiFbXShpbWcvMTQ2NzQ4MV8yNDA0MzQ5MjYxMjQyMzJfNTUwMzEwNzcyX24uanBnKQoKPC9icj4KCmBgYHtyfQphZGRfc3RvcCA8LSBkYXRhLmZyYW1lKHdvcmQgPSBjKCJyZWw9XFxub2ZvbGxvd1xcPnR3aXR0ZXIiLCAiaHJlZj1cXGh0dHAvL3R3aXR0ZXJjb20vZG93bmxvYWQvaXBob25lXFwiLCAiaXBob25lPC9hPiIsICI8YSIsICImYW1wOyIsICJocmVmPVxcaHR0cC8vdHdpdHRlcmNvbS8vZG93bmxvYWQvaXBhZFxcIiwgImlwYWQ8L2E+IiksIAogICAgICAgICAgICAgICAgICAgICAgIGxleGljb24gPSAiY3VzdG9tIiwgc3RyaW5nc0FzRmFjdG9ycyA9IEZBTFNFKQoKCnN0b3AgPC0gYmluZF9yb3dzKHN0b3AsIGFkZF9zdG9wKQoKYGBgCgpgYGB7cn0Kd29yZHMgPC0gYW50aV9qb2luKGRhdGEuZnJhbWUod29yZHMpLCBzdG9wLCBieT1jKCJ3b3JkcyIgPSAid29yZCIpKQoKd29yZHMgJT4lIGNvdW50KHdvcmRzKSAlPiUgYXJyYW5nZShkZXNjKG4pKQoKd29yZHMgPC0gd29yZHMgJT4lIGNvdW50KHdvcmRzKSAlPiUgYXJyYW5nZShkZXNjKG4pKQpgYGAKJ3ByZXNpZGVudCcsICdwZW9wbGUnLCAnZmFrZScsICduZXdzJywgJ2RhY2EnLCBkZW1vY3JhdHMnLCAnam9icycsICdvYmFtYScsICdib3JkZXInLCAnZmJpJywgJ2NvbGx1c2lvbicsICdydXNzaWEnLCAnd2FsbCcsICdtZXhpY28nIGFuZCBmdXJ0aGVyIGRvd24gaXMgJ2JhZCcsICdjcm9va2VkJyBhbmQgJ2hpbGxhcnknLiAKClRydW1wJ3Mgd29yZGNsb3VkIG1pbnVzIGhpcyB0d2l0dGVyIGhhbmRsZS4KYGBge3J9CmxpYnJhcnkoIndvcmRjbG91ZCIpCmxpYnJhcnkoInZpcmlkaXMiKQoKd29yZHNbMjo1MSxdICU+JQogICAgd2l0aCh3b3JkY2xvdWQod29yZHMsIG4sIG9yZGVyZWQuY29sb3JzID0gVFJVRSwgYygzLC41KSxjb2xvcnMgPSB2aXJpZGlzKDUwKSwgdXNlLnIubGF5b3V0ID0gVFJVRSkpCmBgYAoKKioqCl9fQ2hhbGxlbmdlX18gCgoKPGRpdiBzdHlsZT0iZmxvYXQ6bGVmdDttYXJnaW46MCAxMHB4IDEwcHggMCIgbWFya2Rvd249IjEiPgohW10oaW1nL21heHJlc2RlZmF1bHQuanBnKXt3aWR0aD0xNTBweH0KCjwvZGl2PgoKUGljayBvbmUgb2YgdGhlIG90aGVyIHR3ZWV0IGRhdGEgc2V0cyBbX19pbnNlcnQgcG9zc2liaWxpdGllc19fXS4gQ2xlYW4gaXQuIFJlbW92ZSBhbGwgb2YgdGhlIHN0b3Agd29yZHMuIE1ha2UgYSB3b3JkY2xvdWQgb2YgdGhlIHRvcCA1MCB3b3Jkcy4KCgo8L2JyPgo8L2JyPgo8L2JyPgoKKioqCgoKCgoKCgoKCgoKCgoKCi1yb3VuZGluZyBkYXRhIC0gd2UgZG9uJ3QgbmVlZCBpdCB0byB0aGUgdW1wdGVlbnRoIGRlY2ltYWwKLWxvb2sgZm9yIGxvd2VzdCBhbmQgaGlnaGVzdCB2YWx1ZXMgLSBkbyB0aGVzZSBtYWtlIHNlbnNlPyAoc3RhbmRhcmQgZGV2aWF0aW9uPykKLWNhbGxlZCBhIHJhbmdlIGNoZWNrLCBzcGVsbCBjaGVjaywgcmVnZXgKLWRvY3VtZW50IHdoYXQgeW91IGFyZSBkb2luZwotbm8gYWxsIGRhdGEgc2V0cyBhcmUgMTAwJSBjbGVhbgotYWxzbyB0aGUgcGFzdGUgZnVuY3Rpb25zCgotdGhlIGJhc2ljcyBydW5uaW5nIHRocm91Z2ggYSBmdW4gZXhhbXBsZQotIGRvIGEgY2hvc2VuIG9uZSwgc2VlIHdoYXQgb3RoZXIgcHJvYmxlbXMgY29tZSB1cAotcm1hcmtkb3duLCBzeW50YXgsIGV0YwotbWFrZSBhIHBkZiBvZiBjbGVhbmluZyB0aGUgV2VsbGNvbWVUcnVzdCBkYXRhc2V0IChnaXZlIGEgYnJpZWYgb3V0bGluZSBvZiB3aGF0IG5lZWRzIHRvIGJlIGRvbmUpCgoKcmVtZW1iZXIgcmVnZXggdGVzdGVycwpodHRwczovL3JlZ2V4MTAxLmNvbS8KaHR0cHM6Ly9yZWdleHIuY29tLwoKCgotIHNlYXJjaGluZyBmb3IgYSB3b3JkL3BhdHRlcm5zLCBzdWJzZXQgdXNpbmcgY2hhcmFjdGVyIHN0cmluZ3MsIGNvbGxhcHNlIGFuZCBleHBhbmQgY2hhcmFjdGVyIHZlY3RvcnMsIHJlcGxhY2VtZW50LCByZXBsYWNpbmcgTkFzLCBzcGxpdHRpbmcvY29tYmluaW5nIGF0IGEgZGVsaW1pdGVyCgotcmVnZXhwciwgc3RyX2xvY2F0ZQoKCgoKYGBge3J9CgoKc2F2ZSA8LSBzdHJfZXh0cmFjdF9hbGwodHJ1bXBfdHdlZXRzX2RmJHRleHQsICJbI3xAfFs6YWxudW06XV0rKFteXFxzXVtbOmFsbnVtOl1dKyk/Iiwgc2ltcGxpZnkgPSBUUlVFKQojZ2F0aGVyLCBnZXQgcmlkIG9mIGVtcHR5IHN0cmluZ3MKdGVzdCA8LSBnYXRoZXIoYXMuZGF0YS5mcmFtZShzYXZlKSwgdmFsdWUgPSAid29yZCIpICU+JSBmaWx0ZXIod29yZCAhPSAiIikKCgpgYGAKCkl0IGxvb2tzIGxpa2Ugd2UgbmVlZCBzb21lIG1vcmUgZGF0YSBjbGVhbmluZy4gRmlyc3QsIGxldCdzIGdldCByaWQgb2YgZXZlcnl0aGluZyB3aXRoIG51bWJlcnMuCgpgYGB7cn0KdHJ1bXBfd29yZHMgJT4lIHNlbGVjdCh3b3JkKSAlPiUgc3RyX3JlbW92ZSgiWzAtOV0rIiwgc2ltcGxpZnkpCgojcmVtb3ZlcyBudW1iZXJzCnN0cl9yZW1vdmVfYWxsKHRydW1wX3dvcmRzJHdvcmQsICJbMC05XSoiKQojc3RpbGwgaGF2ZSBwdW5jdHVhdGlvbiBiZWZvcmUgbnVtYmVycwpzdHJfcmVtb3ZlX2FsbCh0cnVtcF93b3JkcyR3b3JkLCAiWzAtOV0uKiIpCgoKYGBgCkl0J3MgbG9va2luZyBiZXR0ZXIuIFdlIGhhdmUgYSBzaW5nbGUgaGFzaHRhZy4gIidzIiBlbmRpbmdzIHNob3VsZCBiZSByZW1vdmVkIC0gY291bGQgbWF0Y2ggb3RoZXIgd29yZHMsIG9yIGlmIGEgY29udHJhY3Rpb24gd2lsbCBiZSByZW1vdmVkIHZpYSBzdG9wd29yZHMgbGlzdC4gVGhlcmUgaXMgYWxzbyBhICd1LnMnIHdoZXJlIHdlIGNhbiBnZXQgcmlkIG9mIHRoZSBwZXJpb2QuIElmIGFueW9uZSBjYW4gZmluZCBvdXQgaG93IHRvIHJlbW92ZSB0aGUgYXBvc3Ryb3BoZSBhbmQgbm90IHRoZSBwZXJpb2QsIGxldCBtZSBrbm93LgoKYGBge3J9CnN0cl9yZW1vdmVfYWxsKHRydW1wX3dvcmRzJHdvcmQsICIjJCIpCgoKZ3N1YigiW1s6cHVuY3Q6XV1zJCIsICIiLCB0cnVtcF93b3JkcyR3b3JkKQojbGV0J3MgY2hlY2sgdG8gc2VlIHdoYXQgdGhpcyB3aWxsIHJlbW92ZS4KZ3JlcCgiW1s6cHVuY3Q6XV1zJCIsIHRydW1wX3dvcmRzJHdvcmQsIHZhbHVlPVRSVUUpCiN0aGUgb25seSB0aGluZyB0aGF0IGlzbid0IGFuICdzIGlzIHUucywgd2UgZG9uJ3Qgd2FudCB0aGlzIHJlbW92ZWQgYW5kIHRydW5jYXRlZCB0byAndScsIGJ1dCB3ZSBhbHNvIGRvbid0IHdhbnQgdG8ganVzdCByZW1vdmUgdGhlIHBlcmlvZCBmaXJzdCBiZWNhdXNlIHdlIHdhbnQgdG8gcmV0YWluIHRoYXQgaXQgbWVhbnMgdW5pdGVkIHN0YXRlcyBhbmQgbm90IHVzLiBTbywgSSBhY3R1YWxseSBjb3VsZG4ndCBmaW5kIGEgcmVnZXggcHVuY3R1YXRpb24gc29sdXRpb24gdG8gdGhpcy4gQk9OVVMgcG9pbnRzIGlmIHlvdSBkby4gSW5zdGVhZCwgd2UgYXJlIGdvaW5nIHRvIFJFUExBQ0UgInUucyIgd2l0aCAidXNhIgoKdHJ1bXBfd29yZHMkd29yZCA8LSBzdHJfcmVwbGFjZSh0cnVtcF93b3JkcyR3b3JkLCAidS5zIiwgInVzYSIpCgojY2hlY2sKZ3JlcCgiW1s6cHVuY3Q6XV1zJCIsIHRydW1wX3dvcmRzJHdvcmQsIHZhbHVlPVRSVUUpCgp0cnVtcF93b3JkcyR3b3JkIDwtIHN0cl9yZW1vdmVfYWxsKHRydW1wX3dvcmRzJHdvcmQsICJbWzpwdW5jdDpdXXMkIikKdHJ1bXBfd29yZHMkd29yZCA8LSBzdHJfcmVtb3ZlX2FsbCh0cnVtcF93b3JkcyR3b3JkLCAiIyQiKQoKI2NoZWNrCmdyZXAoIltbOnB1bmN0Ol1dcyQiLCB0cnVtcF93b3JkcyR3b3JkLCB2YWx1ZT1UUlVFKQoKCiNvbmNlIHdlIGtub3cgd2UndmUgZ290IGl0IHJpZ2h0IHdlIGNhbiBmaWx0ZXIgdGhlIGRhdGEgZnJhbWUKdHJ1bXBfd29yZHMgPC0gdHJ1bXBfd29yZHMgJT4lIG11dGF0ZSh3b3JkID0gc3RyX3JlbW92ZV9hbGwodHJ1bXBfd29yZHMkd29yZCwgIlswLTldLioiKSkgJT4lIGZpbHRlcih3b3JkICE9ICIiKQpgYGAKCgoKCgoKCgpJIGxvb2tlZCBmb3IgYSBkYXRhc2V0IGZvciBkYXRhIGNsZWFuaW5nIGFuZCBmb3VuZCBpdCBpbiBhIGJsb2cgdGl0bGVkICJCaW9sb2dpc3RzOiB0aGlzIGlzIHdoeSBiaW9pbmZvcm1hdGljaWFucyBoYXRlIHlvdS4uLiIuIFRoZSBtYWluIGFuZCBjb21tb24gaXNzdWUgd2l0aCB0aGlzIGRhdGFzZXQgaXMgdGhhdCB3aGVuIGRhdGEgZW50cnkgd2FzIGRvbmUgdGhlcmUgd2FzIG5vIHN0cnVjdHVyZWQgdm9jYWJ1bGFyeSAtIG1lYW5pbmcgdGhhdCBwZW9wbGUgY291bGQgdHlwZSBpbiB3aGF0ZXZlciB0aGV5IHdhbnRlZCBpbnN0ZWFkIG9mIHVzaW5nIGRyb3Bkb3duIG1lbnVzIHdpdGggbGltaXRlZCBvcHRpb25zLCBvciBnaXZpbmcgYW4gZXJyb3IgaWYgc29tZXRoaW5nIGlzIGZvcm1hdHRlZCBpbmNvcnJlY3RseSwgb3Igc3RpcHVsYXRpbmcgc29tZSBydWxlcyAoaWUuIG11c3QgYmUgYWxsIGxvd2VyY2FzZSwgdXBwZXJjYXNlLCBubyBudW1iZXJzLCBzcGFjaW5nLCBldGMuKS4gSSBtdXN0IGFkbWl0IEkgaGF2ZSBiZWVuIGd1aWx0eSBvZiBtZXNzaW5nIHdpdGggcGVvcGxlIHdobyBoYXZlIG1hZGUgZGF0YWJhc2VzIHdpdGhvdXQgcnVsZXMuIEZvciBleGFtcGxlLCBpbiBnaXZpbmcgdGhlIGVtZXJnZW5jeSBjb250YWN0IHRoZXJlIHdhcyBhIGxpbmUgdG8gaW5wdXQgJ1JlbGF0aW9uc2hpcCcsIHdoaWNoIGNvdWxkIGVhc2lseSBoYXZlIGJlZW4gYSBkcm9wZG93biBtZW51ICdwYXJlbnQsIHBhcnRuZXIsIGZyaWVuZCwgb3RoZXInLCBidXQgaW5zdGVhZCBJIHdhcyBhbGxvd2VkIHRvIHdyaXRlIGluIGEgZnJlZSB0ZXh0IGxpbmUgJ2xpZmVsb25nIGtpbmRyZWQgc3Bpcml0LCBzb3VsbWF0ZSBhbmQgZG9nZ3ktZGFkZHknLiBJIGRvbid0IHRoaW5rIGFueW9uZSBoZXJlIHdhcyB0cnlpbmcgdG8gYmUgYSBudWlzYW5jZSwgdGhpcyBpcyBqdXN0IGEgY29uc2VxdWVuY2Ugb2YgcG9vciBkYXRhIGNvbGxlY3Rpb24uIFRoZXJlIGlzIGEgUkVBRE1FIGZpbGUgdG8gZ28gd2l0aCB0aGlzIHNwcmVhZHNoZWV0IGlmIHlvdSBoYXZlIHF1ZXN0aW9ucyBhYm91dCB0aGUgZGF0YSBmaWVsZHMuICAKCmh0dHA6Ly93d3cub3BpbmlvbWljcy5vcmcvYmlvbG9naXN0cy10aGlzLWlzLXdoeS1iaW9pbmZvcm1hdGljaWFucy1oYXRlLXlvdS8gICAgIApodHRwczovL2ZpZ3NoYXJlLmNvbS9hcnRpY2xlcy9XZWxsY29tZV9UcnVzdF9BUENfc3BlbmRfMjAxMl8xM19kYXRhX2ZpbGUvOTYzMDU0CgpXaGF0IEkgd2FudCB0byBrbm93IGlzOiAKMS4gTGlzdCA1IHByb2JsZW1zIHdpdGggdGhpcyBkYXRhIHNldC4KMS4gV2hpY2ggcHVibGlzaGVyIGlzIHRoZSBtb3N0IGV4cGVuc2l2ZSB0byBwdWJsaXNoIHdpdGg/CjEuIFdoaWNoIGpvdXJuYWwgaXMgdGhlIG1vc3QgZXhwZW5zaXZlIHRvIHB1Ymxpc2ggd2l0aD8gSXMgdGhpcyBieSB0aGUgc2FtZSBwdWJsaXNoZXI/ICAgICAgICAgICAgICAgICAgCjEuIENvbnZlcnQgc3RlcmxpbmcgdG8gQ0FELiBXaGF0IGlzIHRoZSBtZWRpYW4gY29zdCBvZiBwdWJsaXNoaW5nIHdpdGggRWxzZXZpZXIgaW4gQ0FEPwoKVGhlIGJsb2dnZXIncyBvcGluaW9uIG9mIGNsZWFuaW5nIHRoaXMgZGF0YXNldDoKCidJIG5vdyBoYXZlIG5vIGhhaXIgbGVmdDsgSeKAmXZlIHRvcm4gaXQgYWxsIG91dC4gIE15IHRlZXRoIGFyZSBqdXN0IHN0dW1wcyBmcm9tIGV4Y2Vzc2l2ZSBnbmFzaGluZy4gIE15IGZhaXRoIGluIGh1bWFuaXR5IGhhcyBiZWVuIGRlc3Ryb3llZCEnCgpEb24ndCBnZXQgdG8gdGhpcyBwb2ludC4gVGhlIGRhdGFzZXQgZG9lc24ndCBuZWVkIHRvIGJlIHBlcmZlY3QuIEp1c3QgZG8gd2hhdCB5b3UgZ290dGEgZG8gdG8gYW5zd2VyIHRoZXNlIHF1ZXN0aW9ucy4gIAoKTm90ZSB0byBzZWxmOiBUaGlzIG1heSBiZSB0b28gdG91Z2ggLSBzZWUgaG93IGxvbmcgaXQgdGFrZXMgdG8gZG8uCgpBcHByb3hpbWF0ZSB0aW1lOiAyIGhvdXJzIHBlciBsZXNzb24KCiBfRWFjaCBsZXNzb24gd2lsbCBoYXZlOl8KIAogLSBDb21wcmVoZW5zaW9uIHF1ZXN0aW9ucyBhcyB3ZSBnbyBhbG9uZyBvbiBTb2NyYXRpdmUuCiAtIEhvdyB0byByZWFkIGhlbHAgcGFnZXMgb25saW5lLgogLSBHaXZlIHRoZSBjbGFzcyBhIGZ1bmN0aW9uIG5vdCBwcmV2aW91c2x5IHVzZWQgZHVyaW5nIHRoZSBsZXNzb24gYW5kIGhhdmUgdGhlbSBmaWd1cmUgb3V0IHdoYXQgaXQgZG9lcyBhbmQgaG93IHRvIHVzZSBpdC4KIC0gRWFjaCBsZXNzb24gd2lsbCBzdGFydCBmcm9tIGFuIGV4Y2VsIHNwcmVhZHNoZWV0IHdpdGggaW1wZXJmZWN0IGRhdGEuCgoqKioKCl9SIG1hcmtkb3duIGFuZCBrbml0cl8KCi0gciBtYXJrZG93biBzeW50YXgKICAgICsgIG1ha2luZyB0aGluZ3MgcHJldHR5OiBhZGRpbmcgdGFibGUgb2YgY29udGVudHMsIGltYWdlcywgaHlwZXJsaW5rcywgdXJscwotIGtuaXRyIGNvZGUgY2h1bmsgb3B0aW9ucwogICAgKyBzdXBwcmVzc2luZyBwa2cgbG9hZCB3YXJuaW5ncywgZXZhbCA9IFQvRiwgcmUtcnVubmluZyBzb21lIGNodW5rcyB3aGlsZSBrZWVwaW5nIG90aGVycyBpbiBtZW1vcnkKICAgICsgdGFibGVzIGluIGtuaXRyCi0gcmVuZGVyaW5nIHRvIHBkZiwgaHRtbCwgd29yZCBkb2N1bWVudHMgKGFueSBpbnRlcmVzdCBpbiBzbGlkZXM/KQotIHNoYXJpbmcgb24gUnB1YnMKCl9fQ2hhbGxlbmdlOl9fICAgICAgCgpUYWtlIHRoZSBvcmlnaW5hbCBnYXBtaW5kZXIgZGF0YXNldCBhbmQgY292ZXJ0IGl0IHRvIHRoZSAnY2xlYW4nIGRhdGFzZXQgZm91bmQgaW4gdGhlIGdhcG1pbmRlciBwYWNrYWdlIC8gZmluZCBzb21lIGhvcnJpYmxlIGRhdGFzZXQgdG8gY2xlYW4uIFByZXNlbnQgaW4gYSBrbml0ciB0YWJsZSwgZXhwbGFpbmluZyBzb21lIG9mIHlvdXIgZGF0YSBjbGVhbmluZyBjaGFsbGVuZ2VzIGluIHJtYXJrZG93bi4gS25pdCB0aGUgZG9jdW1lbnQgdG8gYSBwZGYuCiAgIApfX1Jlc291cmNlczpfXyAgICAgCmh0dHA6Ly9zdGF0NTQ1LmNvbS9ibG9jazAyMl9yZWd1bGFyLWV4cHJlc3Npb24uaHRtbApodHRwOi8vc3RhdDU0NS5jb20vYmxvY2swMjdfcmVndWxhci1leHByZXNzaW9ucy5odG1sCmh0dHA6Ly9zdGF0NTQ1LmNvbS9ibG9jazAyOF9jaGFyYWN0ZXItZGF0YS5odG1sICAgICAKaHR0cDovL3I0ZHMuaGFkLmNvLm56L3N0cmluZ3MuaHRtbApodHRwOi8vd3d3Lmdhc3RvbnNhbmNoZXouY29tL0hhbmRsaW5nX2FuZF9Qcm9jZXNzaW5nX1N0cmluZ3NfaW5fUi5wZGYgICAgIApodHRwOi8vdmFyaWFuY2VleHBsYWluZWQub3JnL3IvdHJ1bXAtdHdlZXRzLyAgICAgCmh0dHA6Ly93d3cub3BpbmlvbWljcy5vcmcvYmlvbG9naXN0cy10aGlzLWlzLXdoeS1iaW9pbmZvcm1hdGljaWFucy1oYXRlLXlvdS8gICAgIApodHRwczovL2ZpZ3NoYXJlLmNvbS9hcnRpY2xlcy9XZWxsY29tZV9UcnVzdF9BUENfc3BlbmRfMjAxMl8xM19kYXRhX2ZpbGUvOTYzMDU0CgojI1Bvc3QtTGVzc29uIEFzc2Vzc21lbnQKKioqCl9RdWVzdGlvbnNfCgotIFNwZWVkOiBUb28gc2xvdywgdG9vIGZhc3QsIGp1c3QgcmlnaHQKLSBDb250ZW50OiBUb28gZWFzeSwgdG9vIGhhcmQsIGp1c3QgcmlnaHQKLSBGcm9tIHRoZSBkZXNjcmlwdGlvbiBvZiB0aGUgbGVzc29uLCB0aGUgY29udGVudCB3YXMgd2hhdCBJIGV4cGVjdGVkIHRvIGxlYXJuLiBUL0YKLSBXaGF0IHdhcyB0aGUgbW9zdCB1c2VmdWwgdGhpbmcgeW91IGxlYXJuZWQ/Ci0gV2hhdCB3YXMgdGhlIGxlYXN0IHVzZWZ1bCB0aGluZz8KLSBDb21tZW50cy9zdWdnZXN0aW9ucyBmb3IgaW1wcm92ZW1lbnQuCgoKIyNOb3Rlcwo=